Kafka Connect AWS S3 sink connector doesn't read from topic - apache-kafka

I have a simple standalone S3 sink connector. Here is the relevant part of worker configuration properties:
plugin.path = <plugins directory>
bootstrap.servers = <List of servers on Amazon MKS>
security.protocol = SSL
...
It works fine when I connect it to a locally running Kafka. However when I connect it to a Kafka broker on AWS (with SSL), it doesn't consume anything. No errors, nothing. As if the topic was empty:
[2020-01-30 10:50:03,597] INFO Started S3 connector task with assigned partitions: [] (io.confluent.connect.s3.S3SinkTask:116)
[2020-01-30 10:50:03,598] INFO WorkerSinkTask{id=xxx} Sink task finished initialization and start (org.apache.kafka.connect.runtime.WorkerSinkTask:302)
When I enabled DEBUG mode in connect-log4j.properties, I started seeing lots of error messages:
Completed connection to node -2. Fetching API versions. (org.apache.kafka.clients.NetworkClient:914)
Initiating API versions fetch from node -2. (org.apache.kafka.clients.NetworkClient:928)
Connection with YYY disconnected (org.apache.kafka.common.network.Selector:607)
java.io.EOFException
at org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:119)
at org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:424)
at org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:385)
...
Node -2 disconnected. (org.apache.kafka.clients.NetworkClient:894)
Initialize connection to node XXX (id: -3 rack: null) for sending metadata request (org.apache.kafka.clients.NetworkClient:1125)
Initiating connection to node XXX (id: -3 rack: null) using address XXX (org.apache.kafka.clients.NetworkClient:956)
Am I missing something with SSL configuration? Note that manually created org.apache.kafka.clients.consumer.KafkaConsumers can successfully read from this topic having only set "security.protocol = SSL".
EDIT:
Here are the connector properties:
name = my-connector
connector.class = io.confluent.connect.s3.S3SinkConnector
topics = some_topic
timestamp.extractor = Record
locale = de_DE
timezone = UTC
storage.class = io.confluent.connect.s3.storage.S3Storage
partitioner.class = io.confluent.connect.storage.partitioner.HourlyPartitioner
format.class = io.confluent.connect.s3.format.bytearray.ByteArrayFormat
s3.bucket.name = some-s3-bucket
s3.compression.type = gzip
flush.size = 3
s3.region = eu-central-1

I had a similar problem, which got solved after I have specified security protocol for consumer additionally (besides the global one): So just add
consumer.security.protocol = SSL
To the configuration properties

Related

AWS MSK - Internal Brokers communication

I am using AWS MSK for our production workload and we have been noticing some not very clear log messages in cloudwatch. The messages are about the internal communication between brokers (more on cluster setup later):
[2022-05-14 06:50:17,171] INFO [SocketServer brokerId=2] Failed authentication with ec2-18-185-175-128.eu-central-1.compute.amazonaws.com/18.185.175.128 ([97fe8ff0-ee38-46c5-ae21-1545fd571224]: Access denied) (org.apache.kafka.common.network.Selector)
Our logs are cluttered with these recurring messages. The logs can be found on all three brokers, all referencing the brokerId=2, as per the message above.
I am assuming the instance referenced is one of the MSK brokers.
Whilst the logs are at INFO level and the cluster seems to work fine, I'd like to understand if anyone had to face these sorts of output messages before?
The MSK config is the following:
3 brokers over 3 availability zones
encryption in transit,client_broker = TLS, encryption in cluster
client_authentication sasl I am
cluster properties: auto.create.topics.enable = true, default.replication.factor = 3, num.partitions = 3, delete.topic.enable = true, min.insync.replicas = 2, log.retention.hours = 168, compression.type = gzip
kafka version: 2.7.0
I would be interested to know how to get rid of this log message and if this should be a matter of worry.
Thanks,
Alessio

How to set consumer config values for Kafka Mirrormaker-2 2.6.1?

I am attempting to use mirrormaker 2 to replicate data between AWS Managed Kafkas (MSK) in 2 different AWS regions - one in eu-west-1 (CLOUD_EU) and the other in us-west-2 (CLOUD_NA), both running Kafka 2.6.1. For testing I am currently trying just to replicate topics 1 way, from EU -> NA.
I am starting a mirrormaker connect cluster using ./bin/connect-mirror-maker.sh and a properties file (included)
This works fine for topics with small messages on them, but one of my topic has binary messages up to 20MB in size. When I try to replicate that topic I get an error every 30 seconds
[2022-04-21 13:47:05,268] INFO [Consumer clientId=consumer-29, groupId=null] Error sending fetch request (sessionId=INVALID, epoch=INITIAL) to node 2: {}. (org.apache.kafka.clients.FetchSessionHandler:481)
org.apache.kafka.common.errors.DisconnectException
When logging in DEBUG to get more information we get
[2022-04-21 13:47:05,267] DEBUG [Consumer clientId=consumer-29, groupId=null] Disconnecting from node 2 due to request timeout. (org.apache.kafka.clients.NetworkClient:784)
[2022-04-21 13:47:05,268] DEBUG [Consumer clientId=consumer-29, groupId=null] Cancelled request with header RequestHeader(apiKey=FETCH, apiVersion=11, clientId=consumer-29, correlationId=35) due to node 2 being disconnected (org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient:593)
It gets stuck in a loop constantly disconnecting with request timeout every 30s and then trying again.
Looking at this, I suspect that the problem is the request.timeout.ms is on the default (30s) and it times out trying to read the topic with many large messages.
I followed the guide at https://github.com/apache/kafka/tree/trunk/connect/mirror to attempt to configure the consumer properties, however, no matter what I set, the timeout for the consumer remains fixed at the default, confirmed both by kafka outputting its config in the log and by timing how long between the disconnect messages. e.g. I set:
CLOUD_EU.consumer.request.timeout.ms=120000
In the properties that I start MM-2 with.
based on various guides I have found while looking at this, I have also tried
CLOUD_EU.request.timeout.ms=120000
CLOUD_EU.cluster.consumer.request.timeout.ms=120000
CLOUD_EU.consumer.override.request.timeout.ms=120000
CLOUD_EU.cluster.consumer.override.request.timeout.ms=120000
None of which have worked.
How can I change the consumer request.timeout setting? The log is approx 10,000 lines long, but everywhere where the ConsumerConfig is logged out it logs request.timeout.ms = 30000
Properties file I am using:
# specify any number of cluster aliases
clusters = CLOUD_EU, CLOUD_NA
# connection information for each cluster
CLOUD_EU.bootstrap.servers = kafka.eu-west-1.amazonaws.com:9092
CLOUD_NA.bootstrap.servers = kafka.us-west-2.amazonaws.com:9092
# enable and configure individual replication flows
CLOUD_EU->CLOUD_NA.enabled = true
CLOUD_EU->CLOUD_NA.topics = METRICS_ATTACHMENTS_OVERSIZE_EU
CLOUD_NA->CLOUD_EU.enabled = false
replication.factor=3
tasks.max = 1
############################# Internal Topic Settings #############################
checkpoints.topic.replication.factor=3
heartbeats.topic.replication.factor=3
offset-syncs.topic.replication.factor=3
offset.storage.replication.factor=3
status.storage.replication.factor=3
config.storage.replication.factor=3
############################ Kafka Settings ###################################
# CLOUD_EU cluster over writes
CLOUD_EU.consumer.request.timeout.ms=120000
CLOUD_EU.consumer.session.timeout.ms=150000

Kafka Snowflake ConnectStandalone - Error while starting the Snowflake connector

[SF_KAFKA_CONNECTOR] SnowflakeSinkTask[ID:0]:start. Time: 0 seconds (com.snowflake.kafka.connector.SnowflakeSinkTask:154)
[2021-09-07 23:19:44,145] INFO WorkerSinkTask{id=snowflakeslink-0} Sink task finished initialization and start (org.apache.kafka.connect.runtime.WorkerSinkTask:309)
[2021-09-07 23:19:44,169] WARN [Consumer clientId=connector-consumer-snowflakeslink-0, groupId=connect-snowflakeslink] Connection to node -1 (localhost/127.0.0.1:9092) terminated during authentication. This may happen due to any of the following reasons: (1) Authentication failed due to invalid credentials with brokers older than 1.0.0, (2) Firewall blocking Kafka TLS traffic (eg it may only allow HTTPS traffic), (3) Transient network issue. (org.apache.kafka.clients.NetworkClient:769)
[2021-09-07 23:19:44,170] WARN [Consumer clientId=connector-consumer-snowflakeslink-0, groupId=connect-snowflakeslink] Bootstrap broker localhost:9092 (id: -1 rack: null) disconnected (org.apache.kafka.clients.NetworkClient:1060)
Connection ... terminated during authentication
You need to remove consumer.security.protocol=SSL in your connect-standalone.properties since your broker's server.properties listener is not using SSL
Your next error
Failed to find any class that implements Connector and which name matches com.snowflake.kafka.connector.SnowflakeSinkConnector, available connectors are: PluginDesc{klass=class org.apache.kafka.connect.file.FileStreamSinkConnector, name='org.apache.kafka.connect.file.FileStreamSinkConnector
Look at the list, it indeed doesn't exist, which means you've not correctly extracted the Snowflake connector libraries into the plugin.path, which should be a folder that is external to Kafka's internal lib folder, for example plugin.path=/opt/kafka-connectors/, with a subfolder for snowflake containing all its needed JARs. This way, it will not conflict with the actual classpath of the broker and other Kafka/Zookeeper CLI tools that rely on this folder

Problems with Amazon MSK default configuration and publishing with transactions

Recently we have started doing some testing of our Kafka connectors to MSK, Amazon's managed Kafka service. Publishing records seem to work fine however not when transactions are enabled.
Our cluster consists of 2 brokers (because we have 2 zones) using the default MSK configuration. We are creating our Java Kafka producer using the following properties:
bootstrap.servers=x.us-east-1.amazonaws.com:9094,y.us-east-1.amazonaws.com:9094
client.id=kafkautil
max.block.ms=5000
request.timeout.ms=5000
security.protocol=SSL
transactional.id=transactions
However when the producer was started with the transactional.id setting which enables transactions, the initTransactions() method hangs:
producer = new KafkaProducer<Object, Object>(kafkaProperties);
if (kafkaProperties.containsKey(ProducerConfig.TRANSACTIONAL_ID_CONFIG)) {
// this hangs
producer.initTransactions();
}
Looking at the log output we see streams of the following, and it didn't seem like it ever timed out.
TransactionManager - Enqueuing transactional request (type=FindCoordinatorRequest,
coordinatorKey=y, coordinatorType=TRANSACTION)
TransactionManager - Request (type=FindCoordinatorRequest, coordinatorKey=y,
coordinatorType=TRANSACTION) dequeued for sending
NetworkClient - Found least loaded node z:9094 (id: -2 rack: null) connected with no
in-flight requests
Sender - Sending transactional request (type=FindCoordinatorRequest, coordinatorKey=y,
coordinatorType=TRANSACTION) to node z (id: -2 rack: null)
NetworkClient - Sending FIND_COORDINATOR {coordinator_key=y,coordinator_type=1} with
correlation id 424 to node -2
NetworkClient - Completed receive from node -2 for FIND_COORDINATOR with
correlation id 424, received {throttle_time_ms=0,error_code=15,error_message=null,
coordinator={node_id=-1,host=,port=-1}}
TransactionManager LogContext.java:129 - Received transactional response
FindCoordinatorResponse(throttleTimeMs=0, errorMessage='null',
error=COORDINATOR_NOT_AVAILABLE, node=:-1 (id: -1 rack: null)) for request
(type=FindCoordinatorRequest, coordinatorKey=xxx, coordinatorType=TRANSACTION)
As far as I can determine, the broker is available and each of the hosts in the bootstrap.servers property are available. If I connect to each of them and publish without transactions then it works.
Any idea what we are missing?
However when the producer was started with the transactional.id setting which enables transactions, the initTransactions() method hangs:
This turned out to a problem with the default AWS MSK properties and the number of brokers. If you create a Kafka cluster with less than 3 brokers, the following settings will need to be adjusted.
The following settings should be set (I think) to the number of brokers:
Property
Kafka Default
AWS Default
Description
default.replication.factor
1
3
Default replication factors for automatically created topics.
min.insync.replicas
1
2
Minimum number of replicas that must acknowledge a write for the write to be considered successful
offsets.topic.replication.factor
3
3
Internal topic that shares offsets on topics.
transaction.state.log.replication.factor
3
3
Replication factor for the transaction topic.
Here's the Kafka docs on broker properties.
Because we have 2 brokers, we ended up with:
default.replication.factor=2
min.insync.replicas=2
offsets.topic.replication.factor=2
transaction.state.log.replication.factor=2
This seemed to resolve the issue. IMHO this is a real problem with the AWS MSK and the default configuration. They need to auto-generate the default configuration and tune it depending on the number of brokers in the cluster.

Apache Flume: kafka.consumer.ConsumerTimeoutException

I'm trying to build pipeline with Apache Flume:
spooldir -> kafka channel -> hdfs sink
Events go to kafka topic without problems and I can see them with kafkacat request. But kafka channel can't write files to hdfs via sink. The error is:
Timed out while waiting for data to come from Kafka
Full log:
2016-02-26 18:25:17,125
(SinkRunner-PollingRunner-DefaultSinkProcessor-SendThread(zoo02:2181))
[DEBUG -
org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:717)]
Got ping response for sessionid: 0x2524a81676d02aa after 0ms
2016-02-26 18:25:19,127
(SinkRunner-PollingRunner-DefaultSinkProcessor-SendThread(zoo02:2181))
[DEBUG -
org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:717)]
Got ping response for sessionid: 0x2524a81676d02aa after 1ms
2016-02-26 18:25:21,129
(SinkRunner-PollingRunner-DefaultSinkProcessor-SendThread(zoo02:2181))
[DEBUG -
org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:717)]
Got ping response for sessionid: 0x2524a81676d02aa after 0ms
2016-02-26 18:25:21,775
(SinkRunner-PollingRunner-DefaultSinkProcessor) [DEBUG -
org.apache.flume.channel.kafka.KafkaChannel$KafkaTransaction.doTake(KafkaChannel.java:327)]
Timed out while waiting for data to come from Kafka
kafka.consumer.ConsumerTimeoutException at
kafka.consumer.ConsumerIterator.makeNext(ConsumerIterator.scala:69)
at
kafka.consumer.ConsumerIterator.makeNext(ConsumerIterator.scala:33)
at
kafka.utils.IteratorTemplate.maybeComputeNext(IteratorTemplate.scala:66)
at kafka.utils.IteratorTemplate.hasNext(IteratorTemplate.scala:58)
at
org.apache.flume.channel.kafka.KafkaChannel$KafkaTransaction.doTake(KafkaChannel.java:306)
at
org.apache.flume.channel.BasicTransactionSemantics.take(BasicTransactionSemantics.java:113)
at
org.apache.flume.channel.BasicChannelSemantics.take(BasicChannelSemantics.java:95)
at
org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:374)
at
org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
at java.lang.Thread.run(Thread.java:745)
My FlUME's config is:
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c2
# Describe/configure the source
a1.sources.r1.type = spooldir
a1.sources.r1.spoolDir = /home/alex/spoolFlume
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://10.12.0.1:54310/logs/flumetest/
a1.sinks.k1.hdfs.filePrefix = flume-
a1.sinks.k1.hdfs.round = true
a1.sinks.k1.hdfs.roundValue = 10
a1.sinks.k1.hdfs.roundUnit = minute
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.writeFormat = Text
a1.channels.c2.type = org.apache.flume.channel.kafka.KafkaChannel
a1.channels.c2.capacity = 10000
a1.channels.c2.transactionCapacity = 1000
a1.channels.c2.brokerList=kafka10:9092,kafka11:9092,kafka12:9092
a1.channels.c2.topic=flume_test_001
a1.channels.c2.zookeeperConnect=zoo00:2181,zoo01:2181,zoo02:2181
# Bind the source and sink to the channel
a1.sources.r1.channels = c2
a1.sinks.k1.channel = c2
With memory channel instead of kafka channel all works good.
Thanks for any ideas in advance!
ConsumerTimeoutException means there is no new message for a long time, doesn't mean connect time out for Kafka.
http://kafka.apache.org/documentation.html
consumer.timeout.ms -1 Throw a timeout exception to the consumer if no message is available for consumption after the specified interval
Kafka's ConsumerConfig class has the "consumer.timeout.ms" configuration property, which Kafka sets by default to -1. Any new Kafka Consumer is expected to override the property with a suitable value.
Below is a reference from Kafka documentation :
consumer.timeout.ms -1
By default, this value is -1 and a consumer blocks indefinitely if no new message is available for consumption. By setting the value to a positive integer, a timeout exception is thrown to the consumer if no message is available for consumption after the specified timeout value.
When Flume creates a Kafka channel, it is setting the timeout.ms value to 100, as seen on the Flume logs at the INFO level. That explains why we see a ton of these ConsumerTimeoutExceptions.
level: INFO Post-validation flume configuration contains configuration for agents: [agent]
level: INFO Creating channels
level: DEBUG Channel type org.apache.flume.channel.kafka.KafkaChannel is a custom type
level: INFO Creating instance of channel c1 type org.apache.flume.channel.kafka.KafkaChannel
level: DEBUG Channel type org.apache.flume.channel.kafka.KafkaChannel is a custom type
level: INFO Group ID was not specified. Using flume as the group id.
level: INFO {metadata.broker.list=kafka:9092, request.required.acks=-1, group.id=flume,
zookeeper.connect=zookeeper:2181, **consumer.timeout.ms=100**, auto.commit.enable=false}
level: INFO Created channel c1
Going by the Flume user guide on Kafka channel settings, I tried to override this value by specifying the below, but that doesn't seem to work though:
agent.channels.c1.kafka.consumer.timeout.ms=5000
Also, we did a load test with pounding data through the channels constantly, and this exception didn't occur during the tests.
I read flume's source code, and found that flume reads value of the key "timeout" for "consumer.timeout.ms".
So you can config the value for "consumer.timeout.ms" like this:
agent1.channels.kafka_channel.timeout=-1