Kafka Failed to create new KafkaAdminClient on Kerberized Cluster - apache-kafka

i have a kerberized cluster with Kafka on it.
I want to use Confluent Schema Registry with Kafka on cluster.
Launching the Schema Registry from my local pc, everything works just fine.
But when i uploaded it on a machine in the cluster and tried to run from it i get:
Error
ERROR Server died unexpectedly: (io.confluent.kafka.schemaregistry.rest.SchemaRegistryMain:50)
org.apache.kafka.common.KafkaException: Failed to create new KafkaAdminClient
...
Caused by: org.apache.kafka.common.KafkaException: javax.security.auth.login.LoginException: Message stream modified (41)
...
Caused by: javax.security.auth.login.LoginException: Message stream modified (41)
I also tried to run it from another machine on the cluster and i get the same result.
Schema-registry.properties
listeners=http://0.0.0.0:8081
kafkastore.connection.url=master01.domain.ext:2181,master02.domain.ext:2181
kafkastore.bootstrap.servers=SASL_PLAINTEXT://xxx.domain.ext:6667,SASL_PLAINTEXT://xxx.domain.ext:6667
kafkastore.topic=_schemas
debug=true
kafkastore.sasl.jaas.config=com.sun.security.auth.module.Krb5LoginModule required \
useKeyTab=true \
storeKey=true \
keyTab="/path/schema-registry/etc/schema-registry/my-keytab.keytab" \
principal="kafka/xxx.domain.ext#DOMAIN.EXT";
kafkastore.sasl.kerberos.service.name=kafka
kafkastore.security.protocol=SASL_PLAINTEXT
kafkastore.sasl.mechanism=GSSAPI
EXECUTION COMMAND:
sudo bash bin/schema-registry-start ./etc/schema-registry/schema-registry.properties
QUESTIONS:
Why it works in my local pc and not on a machine on cluster?
What should i change?
P.S. i get the same result even trying to run the CMAK yahoo kafka-manager ( using same jaas.confing and same keytab )

Related

Kafka connect doesn't find available brokers when volume attached

Symptom : A modified bitnami kafka image contains the kafka-connect jars, they work fine.
But once I add a volume for persistence, it can't find existing brokers.
Details:
I modded the bitnami image in a way to copy the connect jars and launching the connect-distributed.sh.
It works fine, connectors can consume and produce from/to the topics
But once I add persistent volume to the kafka image, the first startup is ok but the next onwards dont. connect.log says:
"[2020-05-21 15:59:34,786] ERROR [Worker clientId=connect-1, groupId=my-group1] Uncaught exception in herder work thread, exiting: (org.apache.kafka.connect.runtime.distributed.DistributedHerder:297)
g.apache.kafka.common.KafkaException: Unexpected error fetching metadata for topic connect-offsets
at org.apache.kafka.clients.consumer.internals.Fetcher.getTopicMetadata(Fetcher.java:403)
at org.apache.kafka.clients.consumer.KafkaConsumer.partitionsFor(KafkaConsumer.java:1965)
at org.apache.kafka.clients.consumer.KafkaConsumer.partitionsFor(KafkaConsumer.java:1933)
at org.apache.kafka.connect.util.KafkaBasedLog.start(KafkaBasedLog.java:138)
at org.apache.kafka.connect.storage.KafkaOffsetBackingStore.start(KafkaOffsetBackingStore.java:109)
at org.apache.kafka.connect.runtime.Worker.start(Worker.java:186)
at org.apache.kafka.connect.runtime.AbstractHerder.startServices(AbstractHerder.java:123)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder.run(DistributedHerder.java:284)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: org.apache.kafka.common.errors.InvalidReplicationFactorException: Replication factor is below 1 or larger than the number of available brokers."
The kafka itself still works well, every topic is present (replfactor of 1) and I can consumer/produce messages by hand. And I can also launch the connector system by hand successfully.
edit: My guess is that without PV it will start the connectors after kafka is up, but with PV it sees immediatly that connectors are already present and tries to load them before the kafka started.
edit2:
modded image:
FROM bitnami/kafka
// copying connect jars..
ADD connect-distributed.properties /opt/prop/connect-distributed.properties
ADD modded-kafka-run.sh /opt/bitnami/scripts/kafka/run.sh
RUN chmod 755 /opt/bitnami/scripts/kafka/run.sh
modded run.sh(I just added the distributed.sh and curl to it):
info "** Starting Kafka **"
/opt/bitnami/kafka/bin/connect-distributed.sh -daemon /opt/prop/connect-distributed.properties
//.. adding the connectors with curl
if am_i_root; then
exec gosu "$KAFKA_DAEMON_USER" "${START_COMMAND[#]}"
else
exec "${START_COMMAND[#]}"
fi
original run.sh: https://github.com/bitnami/bitnami-docker-kafka/blob/master/2/debian-10/rootfs/opt/bitnami/scripts/kafka/run.sh
Hard to tell what the issue is, but the ENTRYPOINT that starts Kafka actually starts after any RUN command.
Not clear why you need to create your own Kafka Connect image when at least two exist
You should be using docker-compose to start 3 separate Zookeeper, Kafka, and Connect clusters

vertica integration with apache kafka error

Could you please help me with the following issue?
I have an installed vertica 9.3 on one host and installed apache kafka on second host. I want integration vertica with apache kafka (2.4.0). I configure by official vertica's guide, but, when I try to make source:
vkconfig source --create --cluster kafka_weblog --source test --partitions 1 --conf /opt/vertica/packages/kafka/config/my.conf
I get error:
Exception in thread "main" com.vertica.solutions.kafka.exception.ConfigurationException: ERROR: [[Vertica][VJDBC](5861) ERROR: Error calling processPartition() in User Function KafkaListTopics at [/data/qb_workspaces/jenkins2/ReleaseBuilds/Grader/REL-9_3_1-x_grader/build/udx/supported/kafka/KafkaUtil.cpp:163], error code: 0, message: Error getting metadata: [Local: Broker transport failure]]
at com.vertica.solutions.kafka.model.StreamSource.validateConfiguration(StreamSource.java:248)
at com.vertica.solutions.kafka.model.StreamSource.setFromMapAndValidate(StreamSource.java:194)
at com.vertica.solutions.kafka.model.StreamModel.<init>(StreamModel.java:93)
at com.vertica.solutions.kafka.model.StreamSource.<init>(StreamSource.java:44)
at com.vertica.solutions.kafka.cli.SourceCLI.getNewModel(SourceCLI.java:62)
at com.vertica.solutions.kafka.cli.SourceCLI.getNewModel(SourceCLI.java:13)
at com.vertica.solutions.kafka.cli.CLI.run(CLI.java:59)
at com.vertica.solutions.kafka.cli.CLI._main(CLI.java:141)
at com.vertica.solutions.kafka.cli.SourceCLI.main(SourceCLI.java:29)
Caused by: java.sql.SQLNonTransientException: [Vertica][VJDBC](5861) ERROR: Error calling processPartition() in User Function KafkaListTopics at [/data/qb_workspaces/jenkins2/ReleaseBuilds/Grader/REL-9_3_1-x_grader/build/udx/supported/kafka/KafkaUtil.cpp:163], error code: 0, message: Error getting metadata: [Local: Broker transport failure]
at com.vertica.util.ServerErrorData.buildException(Unknown Source)
at com.vertica.dataengine.VResultSet.fetchChunk(Unknown Source)
at com.vertica.dataengine.VResultSet.initialize(Unknown Source)
at com.vertica.dataengine.VQueryExecutor.readExecuteResponse(Unknown Source)
at com.vertica.dataengine.VQueryExecutor.handleExecuteResponse(Unknown Source)
at com.vertica.dataengine.VQueryExecutor.execute(Unknown Source)
at com.vertica.jdbc.common.SPreparedStatement.executeWithParams(Unknown Source)
at com.vertica.jdbc.common.SPreparedStatement.executeQuery(Unknown Source)
at com.vertica.solutions.kafka.model.StreamSource.validateConfiguration(StreamSource.java:227)
... 8 more
Caused by: com.vertica.support.exceptions.NonTransientException: [Vertica][VJDBC](5861) ERROR: Error calling processPartition() in User Function KafkaListTopics at [/data/qb_workspaces/jenkins2/ReleaseBuilds/Grader/REL-9_3_1-x_grader/build/udx/supported/kafka/KafkaUtil.cpp:163], error code: 0, message: Error getting metadata: [Local: Broker transport failure]
If I show table weblog_sched.stream_clusters, then there is localhost:9092 in the column hosts, but not my ip-address of my kafka server (192.168.0.8), although when I created the cluster I specified the address of the kafka server:
vkconfig cluster --create --cluster kafka_weblog --hosts 192.168.0.8:9092 --conf /opt/vertica/packages/kafka/config/my.conf
Why is that? (I think, that this error is associated with an incorrect entry in weblog_sched.stream_clusters)
There are few ways of debugging this..
Try telnet 192.168.0.8 9092. If you are able to telnet, then the port is open and Kafka is running and is accessible from the place where you are executing the telnet command. If you are not able to connect via telnet, then you can try this..
Check your firewall or iptables rules and allow 9092 port on that machine.
Ex: iptables -A INPUT -p tcp --dport 9092 -j ACCEPT
If you are still not able to access it, try disabling or allowing that port in your firewall in your system (like ufw, firewalld) etc'
Check your advertised.listeners broker configuration. Ensure that you have set it to PLAINTEXT://192.168.0.8:9092 (if you are using the PLAINTEXT). This most likely may solve your problem.
Make sure to add Kafka hosts in /etc/hosts file of Vertica Cluster. The issue should be resolved. If this is the case, then it means that your advertised.listeners have been configured incorrectly.
Hardcoding your hosts under /etc/hosts is definitely not the best approach. You must fix your advertised.listeners as a more permanent solution.

While I'm trying to run apache atlas. I'm facing some hbase error(I'm using embedded hbase and solr)

My apache atlas server is started but I found errors in my application.log file.
ui for apache atlas is also not running.
I've followed each and every step from apache website. All went good.
I gave all permissions in atlas-env.sh and application-properties files.
can anyone help me to how to figure it out?
Running setup per configuration atlas.server.run.setup.on.start. (SetupSteps$SetupRequired:186)
2019-10-25 12:25:49,366 WARN - [main:] ~ Running setup per configuration atlas.server.run.setup.on.start. (SetupSteps$SetupRequired:186)
2019-10-25 12:25:50,104 WARN - [main:] ~ Retrieve cluster id failed (ConnectionImplementation:551)
java.util.concurrent.ExecutionException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase/hbaseid
at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
at org.apache.hadoop.hbase.client.ConnectionImplementation.retrieveClusterId(ConnectionImplementation.java:549)
at org.apache.hadoop.hbase.client.ConnectionImplementation.<init>(ConnectionImplementation.java:287)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:219)
at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:114)
at org.janusgraph.diskstorage.hbase2.HBaseCompat2_0.createConnection(HBaseCompat2_0.java:46)
at org.janusgraph.diskstorage.hbase2.HBaseStoreManager.<init>(HBaseStoreManager.java:314)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.janusgraph.util.system.ConfigurationUtil.instantiate(ConfigurationUtil.java:58)
at org.janusgraph.diskstorage.Backend.getImplementationClass(Backend.java:476)
at org.janusgraph.diskstorage.Backend.getStorageManager(Backend.java:408)
at org.janusgraph.graphdb.configuration.GraphDatabaseC
When HBase starts, HBase Master node creates the node "/hbase/hbaseid" in zookeeper.
1. Check the processes.
check HBase and zookeeper are running or not with 'jps -m'.
If you configured HBase manages zookeeper internally, you can not see the zookeeper process with jps command then you can check its port with 'netstat -nt | grep ZK_PORT' and normally it uses 2181.
netstat -nt | grep 2181
2. Check the zookeeper node
If you run zookeeper cluster independently, you can check the node "/hbase/hbaseid" with the zookeeper CLI like this.
ZOOKEEPER/bin/zkCli.sh
[zk: ...] ls /
[zk: ...] get /hbase/hbaseid
I hope this could help you.
install atlas
you can download the source code of atlas v2.0.0 or master branch from here and build it.
$ export MAVEN_OPTS="-Xms2g -Xmx2g"
$ mvn clean install
$ mvn clean package -Pdist
If you build the master branch, you can find the server package from /SOURCE_CODE/distro/target/apache-atlas-3.0.0-SNAPSHOT-server/apache-atlas-3.0.0-SNAPSHOT.
You should configure the server prior to run it.
Here are the minimum settings. Please find the atlas-application.properties file in conf directory.
atlas.graph.storage.hostname=xxx.xxx.xxx.xxx:xxxx => zookeeper addr and port for hbase
atlas.graph.index.search.backend=[solr or elasticsearch] => choose one you want to use.
atlas.graph.index.hostname=xxx.xxx.xxx.xxx => solr or elasticsearch server's addr
atlas.kafka.zookeeper.connect=xxx.xxx.xxx.xxx:xxxx => zookeeper addr and port for Kafka
atlas.kafka.bootstrap.servers=xxx.xxx.xxx.xxx:xxxx => kafka addr
atlas.audit.hbase.zookeeper.quorum=xxx.xxx.xxx.xxx:xxxx => zookeeper addr and port for hbase
To run the server,
$ bin/atlas_start.py
install zookeeper
Actually, to install zookeeper, there is almost nothing to do.
just follow the steps
In this case, you should change your hbase env. in hbase-env.sh
export HBASE_MANAGES_ZK=false
If you see some warnings from hbase log file like 'Could not start ZK at requested port of 2181.' then please check the hbase-site.xml file and set hbase.cluster.distributed to true.

spark-submit --keytab option does not copy the file to executors

In my case I am using Spark (2.1.1) and for the processing I need to connect to Kafka (using kerberos, therefore a keytab).
When submitting the job I can pass the keytab with --keytab and --principal options. The main drawback is that the keytab will no be send to the distributed cache (or at least be available to the executors) so it will fail.
Caused by: org.apache.kafka.common.KafkaException: Failed to construct kafka consumer
...
Caused by: org.apache.kafka.common.KafkaException: javax.security.auth.login.LoginException: Could not login: the client is being asked for a password, but the Kafka client code does not currently support obtaining a password from the user. not available to garner` authentication information from the user
If I try passing it also in --files it works (version 2.1.0) but in this latest version (2.1.1) it is not allowed because it failes due to:
Exception in thread "main" java.lang.IllegalArgumentException: Attempt to add (file:keytab.keytab) multiple times to the distributed cache.
Any tips?
I resolved this issue making a copy of my keytab file (e.g. original file is osboo.keytab and its copy osboo-copy-for-kafka.keytab) and pushing it to HDFS via --files option.
# Call
spark2-submit --keytab osboo.keytab \
--principal osboo \
--files osboo-copy-for-kafka.keytab#osboo-copy-for-kafka.keytab,kafka.jaas#kafka.jaas
# kafka.jaas
KafkaClient {
com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=true
keyTab="osboo-copy-for-kafka.keytab"
principal="osboo#REALM.COM"
serviceName="kafka";
};
Client {
com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=true
keyTab="osboo-copy-for-kafka.keytab"
serviceName="zookeeper"
principal="osboo#REALM.COM";
};
Maybe this solution requires less efforts to keep in mind symlinks between files so I hope it helps.
spark-submit --keytab option copy the file with different name in the local container dir when you submit app on yarn.
you can find this in lauch_container.sh
lauch_container.sh

Kerberos issue on Spark when in cluster (YARN) mode

I am using Spark with Kerberos authentication.
I can run my code using spark-shell fine and I can also use spark-submit in local mode (e.g. —master local[16]). Both function as expected.
local mode -
spark-submit --class "graphx_sp" --master local[16] --driver-memory 20G target/scala-2.10/graphx_sp_2.10-1.0.jar
I am now progressing to run in cluster mode using YARN.
From here I can see that you need to specify the location of the keytab and specify the principal. Thus:
spark-submit --class "graphx_sp" --master yarn --keytab /path/to/keytab --principal login_node --deploy-mode cluster --executor-memory 13G --total-executor-cores 32 target/scala-2.10/graphx_sp_2.10-1.0.jar
However, this returns:
Exception in thread "main" java.io.IOException: Login failure for login_node from keytab /path/to/keytab: javax.security.auth.login.LoginException: Unable to obtain password from user
at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:987)
at org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:564)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:154)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: javax.security.auth.login.LoginException: Unable to obtain password from user
at com.sun.security.auth.module.Krb5LoginModule.promptForPass(Krb5LoginModule.java:897)
at com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:760)
at com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755)
at javax.security.auth.login.LoginContext.access$000(LoginContext.java:195)
at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682)
at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680)
at javax.security.auth.login.LoginContext.login(LoginContext.java:587)
at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:978)
... 4 more
Before I run using spark-shell or on local mode in spark-submit I do the following kerberos setup:
kinit -k -t ~/keytab -r 7d `whoami`
Clearly, this setup is not extending to the YARN setup. How do I fix the Kerberos issue with YARN in cluster mode? Is this something which must be in my /src/main/scala/graphx_sp.scala file?
Update
By running kinit -V -k -t ~/keytab -r 7dwhoami in verbose mode I was able to see the prinicpal was in the form user#node.
I updated this, checked the location of the keytab and things passed through this checkpoint succesfully:
INFO security.UserGroupInformation: Login successful for user user#login_node using keytab file /path/to/keytab
However, it then fails post this with:
client token: N/A
diagnostics: User class threw exception: org.apache.hadoop.security.AccessControlException: Authentication required
I have checked the permissions on the keytab and the read permissions are correct. It has been suggested that the next possibility is a corrupt keytab
We found out that the Authentication
required error happens, when the application tries to read from HDFS.
Scala was doing lazy evaluation, so it didn't fail, until it started
processing the file. This read from HDFS line:
webhdfs://name:50070.
Since, WEBHDFS defines a public HTTP REST API to permit access, I
thought it was using acls, but enabling ui.view.acls didn't fix the
issue. Adding --conf
spark.yarn.access.namenodes=webhdfs://name:50070 fixed the
problem. This provides comma-separated list of secure HDFS namenodes,
which the Spark application is going to access. Spark acquires the
security tokens for each of the namenodes so that the application can
access those remote HDFS clusters. This fixed the authentication
required error.
Alternatively, direct access to HDFS hdfs://file works and authenticates using Kerberos, with principal and keytab being passed during spark-submit.