Kafka sink connector not committing offsets but consuming messages

Kafka sink connector not committing offsets but consuming messages - apache-kafka

I am using kafka-connector to sink messages to snowflake.
Docker image: cp-kafka-connect-base:6.2.0
I have two consumer pods running in distributed mode. Please find the connect-config below
connector.class: "com.snowflake.kafka.connector.SnowflakeSinkConnector"
tasks.max: "2"
topics: "test-topic"
snowflake.topic2table.map: "test-topic:table1"
buffer.count.records: "500000"
buffer.flush.time: "240"
buffer.size.bytes: "100000000"
snowflake.url.name: "<url>"
snowflake.warehouse.name: "name"
snowflake.user.name: "username"
snowflake.private.key: "key"
snowflake.private.key.passphrase: "pass"
snowflake.database.name: "db-name"
snowflake.schema.name: "schema-name"
key.converter: "com.snowflake.kafka.connector.records.SnowflakeJsonConverter"
value.converter: "com.snowflake.kafka.connector.records.SnowflakeJsonConverter"
envs:
CONNECT_GROUP_ID: "testgroup"
CONNECT_CONFIG_STORAGE_TOPIC: "snowflakesync-config"
CONNECT_STATUS_STORAGE_TOPIC: "snowflakesync-status"
CONNECT_OFFSET_STORAGE_TOPIC: "snowflakesync-offset"
CONNECT_KEY_CONVERTER: "org.apache.kafka.connect.json.JsonConverter"
CONNECT_VALUE_CONVERTER: "org.apache.kafka.connect.json.JsonConverter"
CONNECT_REST_ADVERTISED_HOST_NAME: "localhost"
CONNECT_REST_PORT: "8083"
CONNECT_CONFIG_STORAGE_REPLICATION_FACTOR: "3"
CONNECT_OFFSET_FLUSH_INTERVAL_MS: "5000"
CONNECT_OFFSET_STORAGE_REPLICATION_FACTOR: "3"
CONNECT_STATUS_STORAGE_REPLICATION_FACTOR: "3"
CONNECT_INTERNAL_KEY_CONVERTER: "org.apache.kafka.connect.json.JsonConverter"
CONNECT_INTERNAL_VALUE_CONVERTER: "org.apache.kafka.connect.json.JsonConverter"
CONNECTOR_NAME: "test-conn"
I am running two pods with the above config. Two pods are properly attached to one partition each and starts consuming.
Question ::
Whenever I deploy / restart the pods,the offets are getting committed [ CURRENT-OFFSET is getting updated] only ONCE , post that the sink connector keeps consuming the messages from topic, but the current-offset is NOT at all updated. ( offsets are not getting committed )
kafka-consumer-groups --bootstrap-server <server> --describe --group connect-test-conn
This is the command used to check the Current-offset is getting updated or not. Since only once the current_offset is updated, it always shows a lag and the lag keeps increasing.
But , I could see in logs ( put records ) & from snowflake the events are getting persisted.
Would like to know why the offsets are not getting committed continuously.
Example case: ( output of consumer-group command )
TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
events-sync 0 6408022 25524319 19116297 connector-consumer-events-sync-0-b9142c5f-3bb7-47b1-bd44-a169a7984952 /xx.xx.xx.xx connector-consumer-events-sync-0
events-sync 1 25521059 25521202 143 connector-consumer-events-sync-1-107f2aa8-969c-4d7e-87f8-fdb2be2480b3 /xx.xx.xx.xx connector-consumer-events-sync-1

Related

Kafka-Streams app unknown topic or partition error

I have a problem trying to get a Kafka-Streams running with a Kafka Cluster. As you can see from the bootstrap servers property there are 3 Kafka servers in the cluster. All is working fine there. But for some reason when this app is started I keep getting "unknown topic or partition error" errors as if the Stream app does not have rights to create the Changlog and Partition topics it needs to run. I have it Kafka set to "auto.create.topics.enable=true" but I ont think this is the issue. Any clues would be a great help.
#Docker YAML file
#KV STORE Instance
kvstore1gcf102:
image: kafka-kv-store:1.1.125
restart: always
hostname: kvstore_13_1_gcf102
ports:
- 7071:8080
environment:
APPLICATIONID_KVSTORE: "kafka-kv-store-gcf102"
HOSTNAME_KVSTORE: "kvstore_13_1_gcf102"
STREAMPROPS_KVSTORE: "bootstrap.servers=192.168.2.13:9092,192.168.2.14:9092,192.168.2.16:9092 num.standby.replicas=0"
TOPIC_KVSTORE: "messages_shown"
Error I am getting:
18:31:11,637 WARN org.apache.kafka.clients.consumer.internals.Fetcher
- [Consumer clientId=kafka-kv-store-gcf102-ab44b41c-71cb-4cfa-926a-898cdfbb192c-StreamThread-1-consumer,
groupId=kafka-kv-store-gcf102] Received unknown topic or partition error
in ListOffset request for partition kafka-kv-store-gcf102-shown-count-kv-repartition-0
as an edit:
root#SRV-003:~/tmp# /opt/kafka/bin/kafka-topics.sh --list --bootstrap-server 192.168.2.13:9092 | grep kafka-kv-store-gcf102
kafka-kv-store-gcf102-shown-count-kv-changelog
kafka-kv-store-gcf102-shown-count-kv-repartition

Kafka topic partition has missing offsets

I have a Flink streaming application which is consuming data from a Kafka topic which has 3 partitions. Even though, the application is continuously running and working without any obvious errors, I see a lag in the consumer group for the flink app on all 3 partitions.
./kafka-consumer-groups.sh --bootstrap-server $URL --all-groups --describe
GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
group-1 topic-test 0 9566 9568 2 - - -
group-1 topic-test 1 9672 9673 1 - - -
group-1 topic-test 2 9508 9509 1 - - -
If I send new records, they get processed but the lag still exists. I tried to view the last few records for partition 0 and this is what I got (ommiting the message part) -
./kafka-console-consumer.sh --topic topic-test --bootstrap-server $URL --property print.offset=true --partition 0 --offset 9560
Offset:9560
Offset:9561
Offset:9562
Offset:9563
Offset:9564
Offset:9565
The log-end-offset value is at 9568 and the current offset is at 9566. Why are these offsets not available in the console consumer and why does this lag exist?
There were a few instances where I noticed missing offsets. Example -
Offset:2344
Offset:2345
Offset:2347
Offset:2348
Why did the offset jump from 2345 to 2347 (skipping 2346)? Does this have something to do with how the producer is writing to the topic?

You can describe your topic for any sort of configuration added while it was created. If the log compaction is enabled through log.cleanup.policy=compact, then the behaviour will be different in the runtime. You can see these lags, due to log compactions lags value setting or missing offsets may be due messages produced with a key but null value.
Configuring The Log Cleaner
The log cleaner is enabled by default. This will start the pool of cleaner threads. To enable log cleaning on a particular topic, add the log-specific property log.cleanup.policy=compact.
The log.cleanup.policy property is a broker configuration setting defined in the broker's server.properties file; it affects all of the topics in the cluster that do not have a configuration override in place. The log cleaner can be configured to retain a minimum amount of the uncompacted "head" of the log. This is enabled by setting the compaction time lag log.cleaner.min.compaction.lag.ms.
This can be used to prevent messages newer than a minimum message age from being subject to compaction. If not set, all log segments are eligible for compaction except for the last segment, i.e. the one currently being written to. The active segment will not be compacted even if all of its messages are older than the minimum compaction time lag.
The log cleaner can be configured to ensure a maximum delay after which the uncompacted "head" of the log becomes eligible for log compaction log.cleaner.max.compaction.lag.ms.

The lag is calculated based on the latest offset committed by the Kafka consumer (lag=latest offset-latest offset committed). In general, Flink commits Kafka offsets when it performs a checkpoint, so there is always some lag if check it using the consumer groups commands.
That doesn't mean that Flink hasn't consumed and processed all the messages in the topic/partition, it just means that it has still not committed them.

How to delete the consumer offset of a group for one specific topic

Assuming I have two topics (both with two partitions and infinite retention):
my_topic_a
my_topic_b
and one consumer group:
my_consumer
At some point, it was consuming both topics, but due to some changes, it's no longer interested in my_topic_a, so it stopped consuming it and now is accumulating lag:
kafka-consumer-groups.sh --bootstrap-server=kafka.core-kafka.svc.cluster.local:9092 --group my_consumer --describe
TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
my_topic_a 0 300000 400000 100000 - - -
my_topic_a 1 300000 400000 100000 - - -
my_topic_b 0 500000 500000 0 - - -
my_topic_b 1 500000 500000 0 - - -
This lag is annoying me because:
My consumer-lag graph in Grafana is tainted.
An automatic alarm is triggered, reminding me about a consumer lagging too much.
Thus I want to get rid of the offsets for my_topic_a of my_consumer, to get to a state as if my_consumer had never consumed my_topic_a.
The following attempt fails:
kafka-consumer-groups.sh --bootstrap-server kafka:9092 --group my_consumer_group --delete --topic domain.user
With this output:
The consumer does not support topic-specific offset deletion from a consumer group.
How can I achieve my goal? (Temporarily stopping all consumers of this group would be a feasible option in my use-case.)
(I'm using Kafka version 2.2.0.)
My guess is, something can be done by writing something to topic __consumer_offsets, but I don't know what it would be. Currently, this topic looks as follows (again, simplified):
kafka-console-consumer.sh --formatter "kafka.coordinator.group.GroupMetadataManager\$OffsetsMessageFormatter" --bootstrap-server kafka:9092 --topic __consumer_offsets --from-beginning
...
[my_consumer_group,my_topic_a,0]::OffsetAndMetadata(offset=299999, leaderEpoch=Optional.empty, metadata=, commitTimestamp=1605000000000, expireTimestamp=None)
[my_consumer_group,my_topic_a,0]::OffsetAndMetadata(offset=300000, leaderEpoch=Optional.empty, metadata=, commitTimestamp=1605000100000, expireTimestamp=None)
...
[my_consumer_group,my_topic_a,1]::OffsetAndMetadata(offset=299999, leaderEpoch=Optional.empty, metadata=, commitTimestamp=1605000000000, expireTimestamp=None)
[my_consumer_group,my_topic_a,1]::OffsetAndMetadata(offset=300000, leaderEpoch=Optional.empty, metadata=, commitTimestamp=1605000100000, expireTimestamp=None)
...
[my_consumer_group,my_topic_b,0]::OffsetAndMetadata(offset=499999, leaderEpoch=Optional.empty, metadata=, commitTimestamp=1607000000000, expireTimestamp=None)
[my_consumer_group,my_topic_b,0]::OffsetAndMetadata(offset=500000, leaderEpoch=Optional.empty, metadata=, commitTimestamp=1607000100000, expireTimestamp=None)
...
[my_consumer_group,my_topic_b,1]::OffsetAndMetadata(offset=499999, leaderEpoch=Optional.empty, metadata=, commitTimestamp=1607000000000, expireTimestamp=None)
[my_consumer_group,my_topic_b,1]::OffsetAndMetadata(offset=500000, leaderEpoch=Optional.empty, metadata=, commitTimestamp=1607000100000, expireTimestamp=None)

In the meantime (Kafka 2.8) it has become possible with the new --delete-offsets parameter for kafka-consumer-groups.sh. :-)
https://stackoverflow.com/a/66644574/1866775
https://cwiki.apache.org/confluence/display/KAFKA/KIP-496%3A+Administrative+API+to+delete+consumer+offsets

The output you are given:
"The consumer does not support topic-specific offset deletion from a consumer group."
is an indicator that it is not possible to remove a specific topic from a consumer group.
You could change the consumer group for the new application reading only my_topic_b, restart the application and then remove the old and idle ConsumerGroup completely. With that approach you will be able to track the consumer lags without any distraction and alerts popping up. When restarting the application with a new consumerGroup it is usually best to stop the producer for topic "b" during the restart to make sure you are not missing any messages.
I would really avoid playing around manually with the topic __consumer_offsets.
As an alternative, you could regularly run a command line tool that comes with Kafka to reduce the lag of your ConsumerGroup:
> bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --reset-offsets --group my_consumer --topic my_topic_a --to-latest
You may need to add the --execute option.

Kafka-connect, Bootstrap broker disconnected

Im trying to setup Kafka Connect with the intent of running a ElasticsearchSinkConnector.
The Kafka-setup, consisting of 3 brokers secured using Kerberos, SSL and and ACL.
So far Ive been experimenting with running the connect-framework and the elasticserch-server localy using docker/docker-compose (Confluent docker-image 5.4 with Kafka 2.4) connecting to the remote kafka-installation (Kafka 2.0.1 - actually our production environement).
KAFKA_OPTS: -Djava.security.krb5.conf=/etc/kafka-connect/secrets/krb5.conf
CONNECT_BOOTSTRAP_SERVERS: srv-kafka-1.XXX.com:9093,srv-kafka-2.XXX.com:9093,srv-kafka-3.XXX.com:9093
CONNECT_REST_ADVERTISED_HOST_NAME: kafka-connect
CONNECT_REST_PORT: 8083
CONNECT_GROUP_ID: user-grp
CONNECT_CONFIG_STORAGE_TOPIC: test.internal.connect.configs
CONNECT_OFFSET_STORAGE_TOPIC: test.internal.connect.offsets
CONNECT_STATUS_STORAGE_TOPIC: test.internal.connect.status
CONNECT_CONFIG_STORAGE_REPLICATION_FACTOR: 1
CONNECT_STATUS_STORAGE_REPLICATION_FACTOR: 1
CONNECT_OFFSET_STORAGE_REPLICATION_FACTOR: 1
CONNECT_KEY_CONVERTER: org.apache.kafka.connect.json.JsonConverter
CONNECT_VALUE_CONVERTER: org.apache.kafka.connect.json.JsonConverter
CONNECT_INTERNAL_KEY_CONVERTER: org.apache.kafka.connect.json.JsonConverter
CONNECT_INTERNAL_VALUE_CONVERTER: org.apache.kafka.connect.json.JsonConverter
CONNECT_ZOOKEEPER_CONNECT: srv-kafka-1.XXX.com:2181,srv-kafka-2.XXX.com:2181,srv-kafka-3.XXX.com:2181
CONNECT_SECURITY_PROTOCOL: SASL_SSL
CONNECT_SASL_KERBEROS_SERVICE_NAME: "kafka"
CONNECT_SASL_JAAS_CONFIG: com.sun.security.auth.module.Krb5LoginModule required \
useKeyTab=true \
storeKey=true \
keyTab="/etc/kafka-connect/secrets/kafka-connect.keytab" \
principal="<principal>;
CONNECT_SASL_MECHANISM: GSSAPI
CONNECT_SSL_TRUSTSTORE_LOCATION: <path_to_truststore.jks>
CONNECT_SSL_TRUSTSTORE_PASSWORD: <PWD>
When starting the connect-framework everything seem to work fine, I can see logs claiming that the kerberos authentication is successfull etc.
The problem comes when I try to start a connect-job using curl.
curl -X POST -H "Content-Type: application/json" --data '{ "name": "kafka-connect", "config": { "connector.class": "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector", "tasks.max": 1, "topics": "test.output.outage", "key.ignore": true, "connection.url": "http://elasticsearch1:9200", "type.name": "kafka-connect" } }' http://localhost:8083/connectors
The job seem to startup without issues but as soon as it is about to start consuming from the kafka-topic I get:
kafka-connect | [2020-04-06 10:35:33,482] WARN [Consumer clientId=connector-consumer-user-grp-2-0, groupId=connect-user-2] Bootstrap broker srv-kafka-1.XXX.com:9093 (id: -1 rack: null) disconnected (org.apache.kafka.clients.NetworkClient)
repeted in the connect-log for all brokers.
What is the nature of this issue? Comunication with the brokers seem to work well - the connect-job is communicated back to the kafka as intended and when the connect-framework is restarted the job seem to resume as intended (even though still faulty).
Anyone have an idea what might be causing this? or how I should go about to debug it.
Since it is our production-environment I have only a limited possibility to change the server-configuration. But from what I can tell nothing in the logs seems to indicate there is something wrong.
Thanks in advance

Per docs, you need to also configure security on the consumer/producer for the connector(s) that Kafka Connect is running. You do this by adding a consumer/producer prefix. So since you're using Docker, and the error suggests that you were creating a sink connector (i.e. requiring a consumer), add to your config:
CONNECT_CONSUMER_SECURITY_PROTOCOL: SASL_SSL
CONNECT_CONSUMER_SASL_KERBEROS_SERVICE_NAME: "kafka"
CONNECT_CONSUMER_SASL_JAAS_CONFIG: com.sun.security.auth.module.Krb5LoginModule required \
useKeyTab=true \
storeKey=true \
keyTab="/etc/kafka-connect/secrets/kafka-connect.keytab" \
principal="<principal>;
CONNECT_CONSUMER_SASL_MECHANISM: GSSAPI
CONNECT_CONSUMER_SSL_TRUSTSTORE_LOCATION: <path_to_truststore.jks>
CONNECT_CONSUMER_SSL_TRUSTSTORE_PASSWORD: <PWD>
If you're also creating a source connector you'll need to replicate the above but for PRODUCER_ too

How can we run multiple kafka consumers through command line?

I am testing kafka performance through the shell script they already provided in the kafka package. I have created a topic with 10 partitions and pumping data as shown below:
./bin/kafka-producer-perf-test.sh --topic test-topic --num-records 9000000 --record-size 300 --throughput 250000 --producer-props bootstrap.servers=110.17.14.302:9092 acks=1 max.in.flight.requests.per.connection=1 batch.size=5000
Now I want to consume the data which I am pumping as shown above from multiple consumers not just from single consumer. So I started using kafka-consumer-perf-test.sh. This is what I was doing:
./bin/kafka-consumer-perf-test.sh --zookeeper localhost:2181 --topic test-topic --group test1
Is there any way by which we can run multiple kafka consumers in a single consumer group through command line and each of those consumers working on different partitions using kafka-consumer-perf-test.sh? I am working with Kafka version 0.10.1.0
I saw this so post but it doesn't say where to configure how many consumers we want to run and what partition they will work on?
Update:
This is the error I saw:
./bin/kafka-consumer-perf-test.sh --zookeeper 110.27.14.10:2181 --messages 50 --topic test-topic --threads 1
[2017-01-11 22:34:09,785] WARN [ConsumerFetcherThread-perf-consumer-14195_kafka-cluster-3098529006-zeidk-1484174043509-46a51434-2-0], Error in fetch kafka.consumer.ConsumerFetcherThread$FetchRequest#54fb48b6 (kafka.consumer.ConsumerFetcherThread)
java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
at org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:93)
at kafka.network.BlockingChannel.readCompletely(BlockingChannel.scala:129)
at kafka.network.BlockingChannel.receive(BlockingChannel.scala:120)
at kafka.consumer.SimpleConsumer.liftedTree1$1(SimpleConsumer.scala:99)
at kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$$sendRequest(SimpleConsumer.scala:83)
at kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(SimpleConsumer.scala:132)
at kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:132)
at kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:132)
at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
at kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply$mcV$sp(SimpleConsumer.scala:131)
at kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(SimpleConsumer.scala:131)
at kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(SimpleConsumer.scala:131)
at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
at kafka.consumer.SimpleConsumer.fetch(SimpleConsumer.scala:130)
at kafka.consumer.ConsumerFetcherThread.fetch(ConsumerFetcherThread.scala:109)
at kafka.consumer.ConsumerFetcherThread.fetch(ConsumerFetcherThread.scala:29)
at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:118)
at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:103)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)

Just run the same command (i.e., ./bin/kafka-consumer-perf-test.sh) multiple times in different consoles.
About partition assignment: Kafka will so this automatically for you. If you use consumer groups.
If you want to do manual partition assignment, you cannot use consumer groups. For this, you cannot use kafka-consumer-perf-test.sh but need to write your own.
Read JavaDoc here: https://kafka.apache.org/0101/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html