Kafka scheduler in Vertica 7.2 is running and working, but produce errors - apache-kafka

At the time when I run /opt/vertica/packages/kafka/bin/vkconfig launch I get such warning:
Unable to determine hostname, defaulting to 'unknown' in scheduler history
But the scheduler continues working fine and consuming messages from Kafka. What does it means?
The next strange thing is thet I find next records in /home/dbadmin/events/dbLog (I think it is Kafka consumer log file):
%3|14470569%3|1446726706.945|FAIL|vertica#consumer-1|
localhost:4083/bootstrap: Failed to connect to broker at
[localhost]:4083: Connection refused
%3|1446726706.945|ERROR|vertica#consumer-1| localhost:4083/bootstrap:
Failed to connect to broker at [localhost]:4083: Connection refused
%3|1446726610.267|ERROR|vertica#consumer-1| 1/1 brokers are down
As I mention, the scheduler is finally starting, but this records periodicaly appear in logs. What is this localhost:4083? Normally my broker runs on 9092 port on separate server which is described in kafka_config.kafka_scheduler table.

In the scheduler history table it attempts to get the hostname using Java:
InetAddress.getLocalHost().getHostAddress();
This will sometimes result in an UnknownHostException for various reasons (you can check documentation here: https://docs.oracle.com/javase/7/docs/api/java/net/UnknownHostException.html)
If this occurs, the hostname will default to "unknown" in that table. Luckily, the schedulers work by locking with your Vertica database, so knowing exactly which Scheduler host is unnecessary for functionality (just monitoring).
The Kafka-related logging in dbLog probably is the standard out from rdkafka (https://github.com/edenhill/librdkafka). I'm not sure what is going on with that log message, unfortunately. Vertica should only be using the configured broker list.

Related

Kafka connect error when trying to use multiple listeners for kafka server

I am deploying a confluent single-node so I am only manipulating connect-standalone.properties and server.properties.
I am trying to connect a remote producer to my local set-up so I have the following overrides in server.properties
listeners=PLAINTEXT://10.20.23.105:9092,EXTERNAL://10.20.23.105:29092
advertised.listeners=PLAINTEXT://10.20.23.105:9092,EXTERNAL://localhost:29092
listener.security.protocol.map=PLAINTEXT:PLAINTEXT,EXTERNAL:PLAINTEXT
After checking using Offset Explorer, I can see that Kafka is still working and I am successfully getting the remote stream. However, Connect fails upon trying to start the service.
[2023-02-13 10:26:02,992] ERROR Stopping due to error (org.apache.kafka.connect.cli.ConnectDistributed:85)
org.apache.kafka.connect.errors.ConnectException: Failed to connect to and describe Kafka cluster. Check worker's broker connection and security properties.
at org.apache.kafka.connect.util.ConnectUtils.lookupKafkaClusterId(ConnectUtils.java:79)
at org.apache.kafka.connect.util.ConnectUtils.lookupKafkaClusterId(ConnectUtils.java:60)
at org.apache.kafka.connect.cli.ConnectDistributed.startConnect(ConnectDistributed.java:96)
at org.apache.kafka.connect.cli.ConnectDistributed.main(ConnectDistributed.java:79)
Caused by: java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment. Call: listNodes
at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999)
at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:165)
at org.apache.kafka.connect.util.ConnectUtils.lookupKafkaClusterId(ConnectUtils.java:73)
What are the possible fixes for this problem?
I have checked out this question Kafka-connect, Bootstrap broker disconnected, but since I am still using PLAINTEXT for my external listener, there shouldn't need to be any changes to the workers right?
Kafka Connect isn't the problem. Start debugging with kafka-console-producer, for example.
listeners should not be hard-coded to any one IP.
Use bind addresses to allow connection from all interfaces.
listeners=PLAINTEXT://0.0.0.0:9092,EXTERNAL://0.0.0.0:29092
For advertised listeners, "external" addresses should use a LAN IP. Not clear what your "localhost" listener is needed for here, since any connection that IP from that same machine would route back to itself, by default. More importantly, you don't need two ports opened for the same protocol-connection.
You've not shown your connect worker properties, but if it is running on an external machine, make sure there is no firewall interfering with the connection, and that you are using the correct IP/hostname and ports.

How can I know if I'm suffering data loss during Kafka Connect intermittent read-from-source issues?

We are running Kafka Connect using the Confluent JDBC Source Connector to read from a DB2 database. Periodically, we see issues like this in our Kafka Connect logs:
kafkaconnect-deploy-prod-967ddfffb-5l4cm 2021-04-23 10:39:43.770 ERROR Failed to run query for table TimestampIncrementingTableQuerier{table="PRODSCHEMA"."VW_PRODVIEW", query='null', topicPrefix='some-topic-prefix-', incrementingColumn='', timestampColumns=[UPDATEDATETIME]}: {} (io.confluent.connect.jdbc.source.JdbcSourceTask:404)
com.ibm.db2.jcc.am.SqlException: DB2 SQL Error: SQLCODE=-668, SQLSTATE=57007, SQLERRMC=1;PRODSCHEMA.SOURCE_TABLE, DRIVER=4.28.11
at com.ibm.db2.jcc.am.b7.a(b7.java:815)
...
at com.ibm.db2.jcc.am.k7.bd(k7.java:785)
at com.ibm.db2.jcc.am.k7.executeQuery(k7.java:750)
at io.confluent.connect.jdbc.source.TimestampIncrementingTableQuerier.executeQuery(TimestampIncrementingTableQuerier.java:200)
at io.confluent.connect.jdbc.source.TimestampIncrementingTableQuerier.maybeStartQuery(TimestampIncrementingTableQuerier.java:159)
at io.confluent.connect.jdbc.source.JdbcSourceTask.poll(JdbcSourceTask.java:371)
This appears to be an intermittent issue connecting to DB2, and is semi-expected; for reasons outside the scope of this question, we know that the network between the two is unreliable.
However, what we are trying to establish is whether in this circumstance data loss is likely to have occurred. I've found this article which talks about error handling in Kafka Connect, but it only refers to errors due to broken messages, not the actual connectivity between Kafka Connect and the data source.
In this case, how would we know if the failure to connect had caused data loss? (i.e. records in our data source that were not processed for target topic). Would there be errors in the Kafka Connect log? Will Kafka Connect always retry indefinitely when it has a connectivity issue? Are there any controls over its retry?
(If it matters, Kafka Connect is version 2.5; it is deployed in a Kubernetes cluster, in distributed mode, but with only one actual running worker/container.)

Kafka - Error on specific consumer -Broker not available

We have deployed multiple Kafka consumers in container's clusters. All are working properly except for one, which is throwing warning "Connection to node 0 could not be established. Broker may not be available", however, this error appears only in one of the containers, and this consumer is running in the same network and server of the others. So I have ruled out issues with kafka server configuration.
I tried changing the groupid of the consumer and I got it working for some minutes, but now warn is appearing again. I consume all topics used by this consumer from a bash shell and I can consume.
Having into account the above context, I think it could be due to bad practice in the consumer software code, also, it could be about offsets got damaged. How could I identify if are there some of this kind using kafka logs?
You can exec into the container and netcat the broker's advertised addresses to verify connectivity.
You can also use the Kafka shell scripts to verify consuming functionality, as always.
Corrupted offsets would prevent any consumer from reading, not only one. Bad code practices wouldn't show up in logs
If you have the container running "on same server as others", I'd suggest working with affinity rules and constraints to spread your applications onto multiple servers before placing on the same machine

Error running multiple kafka standalone hdfs connectors

We are trying to launch multiple standalone kafka hdfs connectors on a given node.
For each connector, we are setting the rest.port and offset.storage.file.filename to different ports and path respectively.
Also kafka broker JMX port is # 9999.
When I start the kafka standalone connector, I get the error
Error: Exception thrown by the agent : java.rmi.server.ExportException: Port already in use: 9999; nested exception is:
java.net.BindException: Address already in use (Bind failed)
Though the rest.port is set to 9100
kafka version: 2.12-0.10.2.1
kafka-connect-hdfs version: 3.2.1
Please help.
We are trying to launch multiple standalone kafka hdfs connectors on a given node.
Have you considered running these multiple connectors within a single instance of Kafka Connect? This might make things easier.
Kafka Connect itself can handle running multiple connectors within a single worker process. Kafka Connect in distributed mode can run on a single node, or across multiple ones.
For those who trying to use rest.port flag and still getting Address already in use error. That flag has been marked as deprecated in KIP-208 and finally removed in PR.
From that point listeners can be used to change default REST port.
Examples from Javadoc
listeners=HTTP://myhost:8083
listeners=HTTP://:8083
Configuring and Running Workers - Standalone mode
You may have open Kafka Connect connections that you don't know about. You can check this with:
ps -ef | grep connect
If you find any, kill those processes.

How to start Zookeeper and then Kafka?

I'm getting started with Confluent Platform which requires to run Zookeeper (zookeeper-server-start /etc/kafka/zookeeper.properties) and then Kafka (kafka-server-start /etc/kafka/server.properties). I am writing an Upstart script that should run both Kafka and Zookeeper. The issue is that Kafka should block until Zookeeper is ready (because it depends on it) but I can't find a reliable way to know when Zookeeper is ready. Here are some attempts in pseudo-code after running the Zookeeper server start:
Use a hardcoded block
sleep 5
Does not work reliably on slower computers and/or waits longer than needed.
Check when something (hopefully Zookeeper) is running on port 2181
wait until $(echo stat | nc localhost ${port}) is not none
This did not seem to work as it doesn't wait long enough for Zookeeper to accept a Kafka connection.
Check the logs
wait until specific string in zookeeper log is found
This is sketchy and there isn't even a string that cannot also be found on error too (e.g. "binding to port [...]").
Is there a reliable way to know when Zookeeper is ready to accept a Kafka connection? Otherwise, I will have to resort to a combination of 1 and 2.
I found that using a timer is not reliable. the second option (waiting for the port) worked for me:
bin/zookeeper-server-start.sh -daemon config/zookeeper.properties && \
while ! nc -z localhost 2181; do sleep 0.1; done && \
bin/kafka-server-start.sh -daemon config/server.properties
The Kafka error message from your comment is definitely relevant:
FATAL [Kafka Server 0], Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer) java.lang.RuntimeException: A broker is already registered on the path /brokers/ids/0. This probably indicates that you either have configured a brokerid that is already in use, or else you have shutdown this broker and restarted it faster than the zookeeper timeout so it appears to be re-registering.
This indicates that ZooKeeper is up and running, and Kafka was able to connect to it. As I would have expected, technique #2 was sufficient for verifying that ZooKeeper is ready to accept connections.
Instead, the problem appears to be on the Kafka side. It has registered a ZooKeeper ephemeral node to represent the starting Kafka broker. An ephemeral node is deleted automatically when the client's ZooKeeper session expires (e.g. the process terminates so it stops heartbeating to ZooKeeper). However, this is based on timeouts. If the Kafka broker restarts rapidly, then after restart, it sees that a znode representing that broker already exists. To the new process start, this looks like there is already a broker started and registered at that path. Since brokers are expected to have unique IDs, it aborts.
Waiting for a period of time past the ZooKeeper session expiration is an appropriate response to this problem. If necessary, you could potentially tune the session expiration to happen faster as discussed in the ZooKeeper Administrator's Guide. (See discussion of tickTime, minSessionTimeout and maxSessionTimeout.) However, tuning session expiration to something too rapid could cause clients to experience spurious session expirations during normal operations.
I have less knowledge on Kafka, but perhaps there is also something that can be done on the Kafka side. I know that some management tools like Apache Ambari take steps to guarantee assignment of a unique ID to each broker on provisioning.