Trouble connecting to MSK over SSL using Kafka-Connect - apache-kafka

I'm having trouble using the AWS MSK TLS endpoints in Confluent Kafka-Connect image as it times out creating/reading to the topics. Works totally fine when I pass the PlainText endpoints.
I tried referencing the jks store path available on the that docker image still doesn't work not quite sure if I'm missing any other configs. From what I read from AWS docs Amazon MSK brokers use public AWS Certificate Manager certificates therefore, any truststore that trusts Amazon Trust Services also trusts the certificates of Amazon MSK brokers.
**Error:**
org.apache.kafka.connect.errors.ConnectException: Timed out while checking for or creating topic(s) '_confluent-command'. This could indicate a connectivity issue, unavailable topic partitions, or if this is your first use of the topic it may have taken too long to create.
Attaching the kafka-connect config I'm using any help would be great :)
INFO org.apache.kafka.clients.admin.AdminClientConfig - AdminClientConfig values:
bootstrap.servers = [**.us-east-1.amazonaws.com:9094,*.us-east-1.amazonaws.com:9094]
client.dns.lookup = default
client.id =
connections.max.idle.ms = 300000
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
receive.buffer.bytes = 65536
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 120000
retries = 5
retry.backoff.ms = 100
sasl.client.callback.handler.class = null
sasl.jaas.config = null
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 60000
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.login.callback.handler.class = null
sasl.login.class = null
sasl.login.refresh.buffer.seconds = 300
sasl.login.refresh.min.period.seconds = 60
sasl.login.refresh.window.factor = 0.8
sasl.login.refresh.window.jitter = 0.05
sasl.mechanism = GSSAPI
security.protocol = SSL
security.providers = null
send.buffer.bytes = 131072
ssl.cipher.suites = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
ssl.endpoint.identification.algorithm = https
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.location = null
ssl.keystore.password = null
ssl.keystore.type = JKS
ssl.protocol = TLS
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.location = JKSStorePath
ssl.truststore.password = ***
ssl.truststore.type = JKS

I used the java cacerts in the docker image at /usr/lib/jvm/zulu-8-amd64/jre/lib/security/cacerts as the truststore. With keytool, if you look at the certs:
keytool --list -v -keystore /usr/lib/jvm/zulu-8-amd64/jre/lib/security/cacerts|grep Amazon
It will list out the Amazon CAs.
I then started the container using:
docker run -d \
--name=kafka-connect-avro-ssl \
--net=host \
-e CONNECT_BOOTSTRAP_SERVERS=<msk_broker1>:9094,<msk_broker2>:9094,<msk_broker3>:9094 \
-e CONNECT_REST_PORT=28083 \
-e CONNECT_GROUP_ID="quickstart-avro" \
-e CONNECT_CONFIG_STORAGE_TOPIC="avro-config" \
-e CONNECT_OFFSET_STORAGE_TOPIC="avro-offsets" \
-e CONNECT_STATUS_STORAGE_TOPIC="avro-status" \
-e CONNECT_KEY_CONVERTER="io.confluent.connect.avro.AvroConverter" \
-e CONNECT_VALUE_CONVERTER="io.confluent.connect.avro.AvroConverter" \
-e CONNECT_KEY_CONVERTER_SCHEMA_REGISTRY_URL="<hostname of EC2 instance>:8081" \
-e CONNECT_VALUE_CONVERTER_SCHEMA_REGISTRY_URL="http://<hostname of EC2 instance>:8081" \
-e CONNECT_INTERNAL_KEY_CONVERTER="org.apache.kafka.connect.json.JsonConverter" \
-e CONNECT_INTERNAL_VALUE_CONVERTER="org.apache.kafka.connect.json.JsonConverter" \
-e CONNECT_REST_ADVERTISED_HOST_NAME="<hostname of EC2 instance>" \
-e CONNECT_LOG4J_ROOT_LOGLEVEL=DEBUG \
-e CONNECT_SECURITY_PROTOCOL=SSL \
-e CONNECT_SSL_TRUSTSTORE_LOCATION=/usr/lib/jvm/zulu-8-amd64/jre/lib/security/cacerts \
-e CONNECT_SSL_TRUSTSTORE_PASSWORD=changeit \
confluentinc/cp-kafka-connect:latest
With that, it started successfully. I was also able to connect to the container, create topics, produce and consume from within the container. If you're unable to create topics, it could be a network connectivity issue, possibly a security group issue of the security group attached to the MSK cluster, blocking ports 2181 and TLS port 9094.

Related

How can I enable SASL in Kafka-Connect (within Cluster)

I have downloaded cp-kafka-connect and deployed in my k8s cluster with a KafKa broker which accept secure connections. (SASL)
I would like to enable security(SASL) for Kafka Connect.
I am using ConfigMap to mount the configuration file named connect-distributed.properties into cp-kafka-connect container (in etc/kafka)
Here is the part of configuration file:
sasl.mechanism=SCRAM-SHA-256
# Configure SASL_SSL if SSL encryption is enabled, otherwise configure SASL_PLAINTEXT
security.protocol=SASL_SSL
sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule required
username="admin" password="password-secret";
But It is failing to start with an error.
Here are the logs:
kubectl logs test-cp-kafka-connect-846f4b745f-hx2mp
===> ENV Variables ...
ALLOW_UNSIGNED=false
COMPONENT=kafka-connect
CONFLUENT_DEB_VERSION=1
CONFLUENT_PLATFORM_LABEL=
CONFLUENT_VERSION=5.5.0
CONNECT_BOOTSTRAP_SERVERS=PLAINTEXT://test-kafka:9092
CONNECT_CONFIG_STORAGE_REPLICATION_FACTOR=3
CONNECT_CONFIG_STORAGE_TOPIC=test-cp-kafka-connect-config
CONNECT_GROUP_ID=test
CONNECT_INTERNAL_KEY_CONVERTER=org.apache.kafka.connect.json.JsonConverter
CONNECT_INTERNAL_VALUE_CONVERTER=org.apache.kafka.connect.json.JsonConverter
CONNECT_KEY_CONVERTER=io.confluent.connect.avro.AvroConverter
CONNECT_KEY_CONVERTER_SCHEMAS_ENABLE=false
CONNECT_KEY_CONVERTER_SCHEMA_REGISTRY_URL=http://test-cp-schema-registry:8081
CONNECT_OFFSET_STORAGE_REPLICATION_FACTOR=3
CONNECT_OFFSET_STORAGE_TOPIC=test-cp-kafka-connect-offset
CONNECT_PLUGIN_PATH=/usr/share/java,/usr/share/confluent-hub-components
CONNECT_REST_ADVERTISED_HOST_NAME=10.233.85.127
CONNECT_REST_PORT=8083
CONNECT_STATUS_STORAGE_REPLICATION_FACTOR=3
CONNECT_STATUS_STORAGE_TOPIC=test-cp-kafka-connect-status
CONNECT_VALUE_CONVERTER=io.confluent.connect.avro.AvroConverter
CONNECT_VALUE_CONVERTER_SCHEMAS_ENABLE=false
CONNECT_VALUE_CONVERTER_SCHEMA_REGISTRY_URL=http://test-cp-schema-registry:8081
CUB_CLASSPATH=/etc/confluent/docker/docker-utils.jar
HOME=/root
HOSTNAME=test-cp-kafka-connect-846f4b745f-hx2mp
KAFKA_ADVERTISED_LISTENERS=
KAFKA_HEAP_OPTS=-Xms512M -Xmx512M
KAFKA_JMX_PORT=5555
KAFKA_VERSION=
KAFKA_ZOOKEEPER_CONNECT=
KUBERNETES_PORT=tcp://10.233.0.1:443
KUBERNETES_PORT_443_TCP=tcp://10.233.0.1:443
KUBERNETES_PORT_443_TCP_ADDR=10.233.0.1
KUBERNETES_PORT_443_TCP_PORT=443
KUBERNETES_PORT_443_TCP_PROTO=tcp
KUBERNETES_SERVICE_HOST=10.233.0.1
KUBERNETES_SERVICE_PORT=443
KUBERNETES_SERVICE_PORT_HTTPS=443
LANG=C.UTF-8
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PWD=/
PYTHON_PIP_VERSION=8.1.2
PYTHON_VERSION=2.7.9-1
SCALA_VERSION=2.12
SHLVL=1
TEST_0_EXTERNAL_PORT=tcp://10.233.13.164:19092
TEST_0_EXTERNAL_PORT_19092_TCP=tcp://10.233.13.164:19092
TEST_0_EXTERNAL_PORT_19092_TCP_ADDR=10.233.13.164
TEST_0_EXTERNAL_PORT_19092_TCP_PORT=19092
TEST_0_EXTERNAL_PORT_19092_TCP_PROTO=tcp
TEST_0_EXTERNAL_SERVICE_HOST=10.233.13.164
TEST_0_EXTERNAL_SERVICE_PORT=19092
TEST_0_EXTERNAL_SERVICE_PORT_EXTERNAL_BROKER=19092
TEST_CP_KAFKA_CONNECT_PORT=tcp://10.233.38.137:8083
TEST_CP_KAFKA_CONNECT_PORT_8083_TCP=tcp://10.233.38.137:8083
TEST_CP_KAFKA_CONNECT_PORT_8083_TCP_ADDR=10.233.38.137
TEST_CP_KAFKA_CONNECT_PORT_8083_TCP_PORT=8083
TEST_CP_KAFKA_CONNECT_PORT_8083_TCP_PROTO=tcp
TEST_CP_KAFKA_CONNECT_SERVICE_HOST=10.233.38.137
TEST_CP_KAFKA_CONNECT_SERVICE_PORT=8083
TEST_CP_KAFKA_CONNECT_SERVICE_PORT_KAFKA_CONNECT=8083
TEST_KAFKA_EXPORTER_PORT=tcp://10.233.5.215:9308
TEST_KAFKA_EXPORTER_PORT_9308_TCP=tcp://10.233.5.215:9308
TEST_KAFKA_EXPORTER_PORT_9308_TCP_ADDR=10.233.5.215
TEST_KAFKA_EXPORTER_PORT_9308_TCP_PORT=9308
TEST_KAFKA_EXPORTER_PORT_9308_TCP_PROTO=tcp
TEST_KAFKA_EXPORTER_SERVICE_HOST=10.233.5.215
TEST_KAFKA_EXPORTER_SERVICE_PORT=9308
TEST_KAFKA_EXPORTER_SERVICE_PORT_KAFKA_EXPORTER=9308
TEST_KAFKA_MANAGER_PORT=tcp://10.233.7.186:9000
TEST_KAFKA_MANAGER_PORT_9000_TCP=tcp://10.233.7.186:9000
TEST_KAFKA_MANAGER_PORT_9000_TCP_ADDR=10.233.7.186
TEST_KAFKA_MANAGER_PORT_9000_TCP_PORT=9000
TEST_KAFKA_MANAGER_PORT_9000_TCP_PROTO=tcp
TEST_KAFKA_MANAGER_SERVICE_HOST=10.233.7.186
TEST_KAFKA_MANAGER_SERVICE_PORT=9000
TEST_KAFKA_MANAGER_SERVICE_PORT_KAFKA_MANAGER=9000
TEST_KAFKA_PORT=tcp://10.233.12.237:9092
TEST_KAFKA_PORT_8001_TCP=tcp://10.233.12.237:8001
TEST_KAFKA_PORT_8001_TCP_ADDR=10.233.12.237
TEST_KAFKA_PORT_8001_TCP_PORT=8001
TEST_KAFKA_PORT_8001_TCP_PROTO=tcp
TEST_KAFKA_PORT_9092_TCP=tcp://10.233.12.237:9092
TEST_KAFKA_PORT_9092_TCP_ADDR=10.233.12.237
TEST_KAFKA_PORT_9092_TCP_PORT=9092
TEST_KAFKA_PORT_9092_TCP_PROTO=tcp
TEST_KAFKA_SERVICE_HOST=10.233.12.237
TEST_KAFKA_SERVICE_PORT=9092
TEST_KAFKA_SERVICE_PORT_BROKER=9092
TEST_KAFKA_SERVICE_PORT_KAFKASHELL=8001
TEST_ZOOKEEPER_PORT=tcp://10.233.1.144:2181
TEST_ZOOKEEPER_PORT_2181_TCP=tcp://10.233.1.144:2181
TEST_ZOOKEEPER_PORT_2181_TCP_ADDR=10.233.1.144
TEST_ZOOKEEPER_PORT_2181_TCP_PORT=2181
TEST_ZOOKEEPER_PORT_2181_TCP_PROTO=tcp
TEST_ZOOKEEPER_SERVICE_HOST=10.233.1.144
TEST_ZOOKEEPER_SERVICE_PORT=2181
TEST_ZOOKEEPER_SERVICE_PORT_CLIENT=2181
ZULU_OPENJDK_VERSION=8=8.38.0.13
_=/usr/bin/env
appID=dAi5R82Pf9xC38kHkGeAFaOknIUImdmS-1589882527
cluster=test
datacenter=testx
namespace=mynamespace
workspace=8334431b-ef82-414f-9348-a8de032dfca7
===> User
uid=0(root) gid=0(root) groups=0(root)
===> Configuring ...
===> Running preflight checks ...
===> Check if Kafka is healthy ...
[main] INFO org.apache.kafka.clients.admin.AdminClientConfig - AdminClientConfig values:
bootstrap.servers = [PLAINTEXT://test-kafka:9092]
client.dns.lookup = default
client.id =
connections.max.idle.ms = 300000
default.api.timeout.ms = 60000
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
receive.buffer.bytes = 65536
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 30000
retries = 2147483647
retry.backoff.ms = 100
sasl.client.callback.handler.class = null
sasl.jaas.config = null
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 60000
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.login.callback.handler.class = null
sasl.login.class = null
sasl.login.refresh.buffer.seconds = 300
sasl.login.refresh.min.period.seconds = 60
sasl.login.refresh.window.factor = 0.8
sasl.login.refresh.window.jitter = 0.05
sasl.mechanism = GSSAPI
security.protocol = PLAINTEXT
security.providers = null
send.buffer.bytes = 131072
ssl.cipher.suites = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
ssl.endpoint.identification.algorithm = https
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.location = null
ssl.keystore.password = null
ssl.keystore.type = JKS
ssl.protocol = TLS
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.location = null
ssl.truststore.password = null
ssl.truststore.type = JKS
[main] INFO org.apache.kafka.common.utils.AppInfoParser - Kafka version: 5.5.0-ccs
[main] INFO org.apache.kafka.common.utils.AppInfoParser - Kafka commitId: 785a156634af5f7e
[main] INFO org.apache.kafka.common.utils.AppInfoParser - Kafka startTimeMs: 1589883940496
[kafka-admin-client-thread | adminclient-1] INFO org.apache.kafka.clients.admin.internals.AdminMetadataManager - [AdminClient clientId=adminclient-1] Metadata update failed
org.apache.kafka.common.errors.TimeoutException: Call(callName=fetchMetadata, deadlineMs=1589883970509) timed out at 1589883970510 after 281 attempt(s)
Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment.
The error is:
[kafka-admin-client-thread | adminclient-1] INFO org.apache.kafka.clients.admin.internals.AdminMetadataManager - [AdminClient clientId=adminclient-1] Metadata update failed
org.apache.kafka.common.errors.TimeoutException: Call(callName=fetchMetadata, deadlineMs=1589883970509) timed out at 1589883970510 after 281 attempt(s)
Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node
Refer to this approach:
sasl-scram-connect-workers
Can someone help me how to resolve this issue?
Change your boostrapServers parameter to point to the SASL listerner. For example:
SASL_SSL://test-kafka:9093

#KafkaListener not recovering after DisconnectException

I have a Kafka consumer (Spring boot) configured using #KafkaListener. This was running in production and all was good until as part of the maintenance the brokers were restarted. By docs, I was expecting that the kafka listener would recover once broker is back up. However this is not what I observed from the logs. The logs stopped with following Exception:
2020-04-22 11:11:28,802|INFO|automator-consumer-app-id-0-C-1|org.apache.kafka.clients.FetchSessionHandler|[Consumer clientId=automator-consumer-app-id-0, groupId=automator-consumer-app-id] Node 10 was unable to process the fetch request with (sessionId=2138208872, epoch=348): FETCH_SESSION_ID_NOT_FOUND.
2020-04-22 11:24:23,798|INFO|automator-consumer-app-id-0-C-1|org.apache.kafka.clients.FetchSessionHandler|[Consumer clientId=automator-consumer-app-id-0, groupId=automator-consumer-app-id] Error sending fetch request (sessionId=499459047, epoch=314160) to node 7: org.apache.kafka.common.errors.DisconnectException.
2020-04-22 11:36:37,241|INFO|automator-consumer-app-id-0-C 1|org.apache.kafka.clients.FetchSessionHandler|[Consumer clientId=automator-consumer-app-id-0, groupId=automator-consumer-app-id] Error sending fetch request (sessionId=2033512553, epoch=342949) to node 4: org.apache.kafka.common.errors.DisconnectException.
Once the application was restarted, the connectivity reestablished. I was wondering if this could be related with any of the consumer configuration below.
2020-04-22 12:46:59,681|INFO|main|org.apache.kafka.clients.consumer.ConsumerConfig|ConsumerConfig values:
allow.auto.create.topics = true
auto.commit.interval.ms = 5000
auto.offset.reset = latest
bootstrap.servers = [msk-e00-br1.int.bell.ca:9092]
check.crcs = true
client.dns.lookup = default
client.id = automator-consumer-app-id-0
client.rack =
connections.max.idle.ms = 540000
default.api.timeout.ms = 60000
enable.auto.commit = false
exclude.internal.topics = true
fetch.max.bytes = 52428800
fetch.max.wait.ms = 500
fetch.min.bytes = 1
group.id = automator-consumer-app-id
group.instance.id = null
heartbeat.interval.ms = 3000
interceptor.classes = []
internal.leave.group.on.close = true
isolation.level = read_uncommitted
key.deserializer = class org.apache.kafka.common.serialization.StringDeserializer
max.partition.fetch.bytes = 1048576
max.poll.interval.ms = 300000
max.poll.records = 500
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
partition.assignment.strategy = [class org.apache.kafka.clients.consumer.RangeAssignor]
receive.buffer.bytes = 65536
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 30000
retry.backoff.ms = 100
sasl.client.callback.handler.class = null
sasl.jaas.config = null
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 60000
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.login.callback.handler.class = null
sasl.login.class = null
sasl.login.refresh.buffer.seconds = 300
sasl.login.refresh.min.period.seconds = 60
sasl.login.refresh.window.factor = 0.8
sasl.login.refresh.window.jitter = 0.05
sasl.mechanism = GSSAPI
security.protocol = PLAINTEXT
send.buffer.bytes = 131072
session.timeout.ms = 10000
ssl.cipher.suites = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
ssl.endpoint.identification.algorithm = https
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.location = null
ssl.keystore.password = null
ssl.keystore.type = JKS
ssl.protocol = TLS
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.location = null
ssl.truststore.password = null
ssl.truststore.type = JKS
value.deserializer = class org.apache.kafka.common.serialization.StringDeserializer
Increase the value of
max.incremental.fetch.session.cache.slots
. The default value is 1K. You can refer the answer here : How to check the actual number of incremental fetch session cache slots used in Kafka cluster?
Basically, your application is continuously listening the messages from the topic, suppose if there is no message published in the topic you will get this type of exception.
org.apache.kafka.common.errors.DisconnectException: null
Disconnect Exception class
If we start sending messages to topic, application will start running and consume those messages.
Here you need to increase the request timeout in your properties files.
consumer.request.timeout.ms:

Why is camel kafka producer very slow?

I am using apache camel kafka as client for producing message, what I observed is kafka producer taking 1 ms to push a message, if I merge message into batch by using camel aggregation then it is taking 100ms to push a single message.
Brief description of installation
3 kafka clusther 16Core 32GB RAM
Sample Code
String endpoint="kafka:test?topic=test&brokers=nodekfa:9092,nodekfb:9092,nodekfc:9092&lingerMs=0&maxInFlightRequest=1&producerBatchSize=65536";
Message message = new Message();
String payload = new ObjectMapper().writeValueAsString(message);
StopWatch stopWatch = new StopWatch();
stopWatch.watch();
for (int i=0;i<size;i++)
{
producerTemplate.sendBody(endpoint,ExchangePattern.InOnly, payload);
}
logger.info("Time taken to push {} message is {}",size,stopWatch.getElasedTime());
camel producer endpoint
kafka:[topic]?topic=[topic]&brokers=[brokers]&maxInFlightRequest=1
I am getting throughput of 1000/s though kafka documentation brag producer tps around 100,000.
Let me know if there is any bug in camel-kafka or in kafka itself.
Producer config
acks = 1
batch.size = 65536
bootstrap.servers = [nodekfa:9092, nodekfb:9092, nodekfc:9092]
buffer.memory = 33554432
client.id =
compression.type = none
connections.max.idle.ms = 540000
enable.idempotence = false
interceptor.classes = []
key.serializer = class org.apache.kafka.common.serialization.StringSerializer
linger.ms = 0
max.block.ms = 60000
max.in.flight.requests.per.connection = 1
max.request.size = 1048576
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
partitioner.class = class org.apache.kafka.clients.producer.internals.DefaultPartitioner
receive.buffer.bytes = 65536
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 305000
retries = 0
retry.backoff.ms = 100
sasl.client.callback.handler.class = null
sasl.jaas.config = null
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 60000
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.login.callback.handler.class = null
sasl.login.class = null
sasl.login.refresh.buffer.seconds = 300
sasl.login.refresh.min.period.seconds = 60
sasl.login.refresh.window.factor = 0.8
sasl.login.refresh.window.jitter = 0.05
sasl.mechanism = GSSAPI
security.protocol = PLAINTEXT
send.buffer.bytes = 131072
ssl.cipher.suites = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
ssl.endpoint.identification.algorithm = https
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.location = null
ssl.keystore.password = null
ssl.keystore.type = JKS
ssl.protocol = TLS
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.location = null
ssl.truststore.password = null
ssl.truststore.type = JKS
transaction.timeout.ms = 60000
transactional.id = null
value.serializer = class org.apache.kafka.common.serialization.StringSerializer
Test Logs
DEBUG [2019-06-02 17:30:46,781] c.g.p.f.u.AuditEventNotifier: >>> Took 3 millis for the exchange on the route : null
DEBUG [2019-06-02 17:30:46,781] c.g.p.f.u.AuditEventNotifier: >>> Took 3 millis to send to external system : kafka://test?brokers=nodekfa%3A9092%2Cnodekfb%3A9092%2Cnodekfc%3A9092&lingerMs=0&maxInFlightRequest=1&producerBatchSize=65536&topic=test by thead http-nio-8551-exec-6
DEBUG [2019-06-02 17:30:46,783] c.g.p.f.u.AuditEventNotifier: >>> Took 2 millis for the exchange on the route : null
DEBUG [2019-06-02 17:30:46,783] c.g.p.f.u.AuditEventNotifier: >>> Took 2 millis to send to external system : kafka://test?brokers=nodekfa%3A9092%2Cnodekfb%3A9092%2Cnodekfc%3A9092&lingerMs=0&maxInFlightRequest=1&producerBatchSize=65536&topic=test by thead http-nio-8551-exec-6
DEBUG [2019-06-02 17:30:46,784] c.g.p.f.u.AuditEventNotifier: >>> Took 1 millis for the exchange on the route : null
DEBUG [2019-06-02 17:30:46,785] c.g.p.f.u.AuditEventNotifier: >>> Took 2 millis to send to external system : kafka://test?brokers=nodekfa%3A9092%2Cnodekfb%3A9092%2Cnodekfc%3A9092&lingerMs=0&maxInFlightRequest=1&producerBatchSize=65536&topic=test by thead http-nio-8551-exec-6
DEBUG [2019-06-02 17:30:46,786] c.g.p.f.u.AuditEventNotifier: >>> Took 1 millis for the exchange on the route : null
DEBUG [2019-06-02 17:30:46,786] c.g.p.f.u.AuditEventNotifier: >>> Took 1 millis to send to external system : kafka://test?brokers=nodekfa%3A9092%2Cnodekfb%3A9092%2Cnodekfc%3A9092&lingerMs=0&maxInFlightRequest=1&producerBatchSize=65536&topic=test by thead http-nio-8551-exec-6
DEBUG [2019-06-02 17:30:46,788] c.g.p.f.u.AuditEventNotifier: >>> Took 2 millis for the exchange on the route : null
DEBUG [2019-06-02 17:30:46,788] c.g.p.f.u.AuditEventNotifier: >>> Took 2 millis to send to external system : kafka://test?brokers=nodekfa%3A9092%2Cnodekfb%3A9092%2Cnodekfc%3A9092&lingerMs=0&maxInFlightRequest=1&producerBatchSize=65536&topic=test by thead http-nio-8551-exec-6
INFO [2019-06-02 17:30:46,788] c.g.p.f.a.MessageApiController: Time taken to push 5 message is 10ms
It is clearly taking minimum 1ms for message, default worker pool max size is 20 , if i set compression codec to snappy this will make performance worst.
Let me know what I am missing !!
I am facing the same issue, from this email https://camel.465427.n5.nabble.com/Kafka-Producer-Performance-tp5785767p5785860.html I used https://camel.apache.org/manual/latest/aggregate-eip.html to create batches and got better performance
from("direct:dp.events")
.aggregate(constant(true), new ArrayListAggregationStrategy())
.completionSize(3)
.to(kafkaUri)
.to("log:out?groupInterval=1000&groupDelay=500")
.end();
I get :
INFO Received: 1670 new messages, with total 13949 so far. Last group took: 998 millis which is: 1,673.347 messages per second. average: 1,262.696
This is using 1 Azure Event Hub using Kafka Protocol w/ one partition. The weird thing is that when I use another EH w/ 5 partitions I get bad performance compare to the 1 partition example...
Multiple partitions (UPDATE)
I was able to get 3K message per second by increasing the workerPoolCoreSize and the workerPoolMaxSize, in addition to adding partition keys to the messages and adding aggregation before sending to kafka endpoint

Testing Kafka HA and getting, NetworkException: The server disconnected before a response was received

running Confluent Kafka 4.1.1 community.
I have...
min in-sync replicas = 2
Topic: 1 partition, replica count 3
Total of 3 brokers.
Producer is set to acks = -1
All other producer settings are default.
I launch my application and as it starts to write records to Kafka I down one of the brokers on purpose and I immediately get: org.apache.kafka.common.errors.NetworkException: The server disconnected before a response was received.
Based on the settings above. shouldn't the producer write() succeed this and not throw an error?
Clarification
I kill a broker on purpose
This only seems happens if the leader broker is killed?
Without seeing full config. and log messages, hard to say, still..
In Kafka, all writes go through the leader partition. In your setting, out of 3 brokers, you killed 1. So it should be possible to write successfully to the remaining 2 and get acknowledgement. But in case the broker which has been killed is the leader node, it can result in an exception.
From the docs:
acks=all This means the leader will wait for the full set of in-sync
replicas to acknowledge the record. This guarantees that the record
will not be lost as long as at least one in-sync replica remains
alive. This is the strongest available guarantee.
You can in any case set retries to a value higher than 0 and see the behaviour - a new leader should be elected and your write should be eventually succeed
For Spring Cloud Stream for kafka binder for Azure Eventhub for Kafka
Exception:
{"timestamp":"2020-09-23 23:37:18.541","level":"ERROR","class":"org.springframework.kafka.support.LoggingProducerListener.onError 84", "thread":"kafka-producer-network-thread | producer-2","traceId":"","message":Exception thrown when sending a message with key='null' and payload='{123, 34, 115, 116, 97, 116, 117, 115, 34, 58, 34, 114, 101, 97, 100, 121, 34, 44, 34, 101, 118, 101...' to topic executor-networkexception and partition 3:}
org.apache.kafka.common.errors.NetworkException: The server disconnected before a response was received.
{"timestamp":"2020-09-23 23:37:18.545","level":"WARN ","class":"org.apache.kafka.clients.producer.internals.Sender.completeBatch 568", "thread":"kafka-producer-network-thread | producer-2","traceId":"","message":[Producer clientId=producer-2] Received invalid metadata error in produce request on partition executor-networkexception-3 due to org.apache.kafka.common.errors.NetworkException: The server disconnected before a response was received.. Going to request metadata update now}
Solution: setup idle time, retry count, retry backoff time -
spring:
cloud:
stream:
kafka:
binder:
brokers: srsmvsdneventhubstage.servicebus.windows.net:9093
configuration:
sasl.jaas.config: 'org.apache.kafka.common.security.plain.PlainLoginModulerequiredusername="$ConnectionString"password="Endpoint=sb://xxxxx.servicebus.windows.net/;=";'
sasl.mechanism: PLAIN
security.protocol: SASL_SSL
retries: 3
retry.backoff.ms: 60
connections.max.idle.ms: 240000
Reference:
http://kafka.apache.org/090/documentation.html (read http://kafka.apache.org/090/documentation.html#producerconfigs)
[https://github.com/Azure/azure-event-hubs-for-kafka/blob/master/CONFIGURATION.md][2] (read connections.max.idle.ms)
Logs:
"org.apache.kafka.clients.producer.ProducerConfig.logAll 279", "thread":"hz._hzInstance_1_dev.cached.thread-14","traceId":"","message":ProducerConfig values:
acks = 1
batch.size = 16384
bootstrap.servers = [srsmvsdneventhubstage.servicebus.windows.net:9093]
buffer.memory = 33554432
client.id =
compression.type = none
connections.max.idle.ms = 540000
enable.idempotence = false
interceptor.classes = []
key.serializer = class org.apache.kafka.common.serialization.ByteArraySerializer
linger.ms = 0
max.block.ms = 60000
max.in.flight.requests.per.connection = 5
max.request.size = 1048576
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
partitioner.class = class org.apache.kafka.clients.producer.internals.DefaultPartitioner
receive.buffer.bytes = 32768
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 30000
retries = 0
retry.backoff.ms = 100
sasl.client.callback.handler.class = null
sasl.jaas.config = [hidden]
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 60000
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.login.callback.handler.class = null
sasl.login.class = null
sasl.login.refresh.buffer.seconds = 300
sasl.login.refresh.min.period.seconds = 60
sasl.login.refresh.window.factor = 0.8
sasl.login.refresh.window.jitter = 0.05
sasl.mechanism = PLAIN
security.protocol = SASL_SSL
send.buffer.bytes = 131072
ssl.cipher.suites = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
ssl.endpoint.identification.algorithm = https
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.location = null
ssl.keystore.password = null
ssl.keystore.type = JKS
ssl.protocol = TLS
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.location = null
ssl.truststore.password = null
ssl.truststore.type = JKS
transaction.timeout.ms = 60000
transactional.id = null
value.serializer = class org.apache.kafka.common.serialization.ByteArraySerializer
}
New-
"org.apache.kafka.clients.producer.ProducerConfig.logAll 279", "thread":"hz._hzInstance_1_dev.cached.thread-20","traceId":"","message":ProducerConfig values:
acks = 1
batch.size = 16384
bootstrap.servers = [xxxxx.servicebus.windows.net:9093]
buffer.memory = 33554432
client.id =
compression.type = none
**connections.max.idle.ms = 240000**
enable.idempotence = false
interceptor.classes = []
key.serializer = class org.apache.kafka.common.serialization.ByteArraySerializer
linger.ms = 0
max.block.ms = 60000
max.in.flight.requests.per.connection = 5
max.request.size = 1048576
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
partitioner.class = class org.apache.kafka.clients.producer.internals.DefaultPartitioner
receive.buffer.bytes = 32768
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 30000
**retries = 3**
**retry.backoff.ms = 60**
sasl.client.callback.handler.class = null
sasl.jaas.config = [hidden]
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 60000
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.login.callback.handler.class = null
sasl.login.class = null
sasl.login.refresh.buffer.seconds = 300
sasl.login.refresh.min.period.seconds = 60
sasl.login.refresh.window.factor = 0.8
sasl.login.refresh.window.jitter = 0.05
sasl.mechanism = PLAIN
security.protocol = SASL_SSL
send.buffer.bytes = 131072
ssl.cipher.suites = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
ssl.endpoint.identification.algorithm = https
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.location = null
ssl.keystore.password = null
ssl.keystore.type = JKS
ssl.protocol = TLS
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.location = null
ssl.truststore.password = null
ssl.truststore.type = JKS
transaction.timeout.ms = 60000
transactional.id = null
value.serializer = class org.apache.kafka.common.serialization.ByteArraySerializer
}

StreamsException: No valid committed offset found for input topic

I have two node Kafka cluster with replication factor two. My Kafka version is 0.10.2.0.
Previously my application was used 0.11.0.0 Kafka Streams API. This is changed 0.11.0.1 Kafka stream to fix Kafka Streams application is throwing 'CommitFailedException' exceptions as suggested at https://issues.apache.org/jira/browse/KAFKA-5786 and https://issues.apache.org/jira/browse/KAFKA-5152
Following are ConsumerConfig values:
auto.commit.interval.ms = 5000
auto.offset.reset = earliest
bootstrap.servers = [1.1.1.1:9092, 1.1.1.2:9092]
check.crcs = true
client.id = sample-app-0.0.1-27c65ef7-d07f-4619-96be-852f5772d73d-StreamThread-1-consumer
connections.max.idle.ms = 540000
enable.auto.commit = false
exclude.internal.topics = true
fetch.max.bytes = 52428800
fetch.max.wait.ms = 500
fetch.min.bytes = 1
group.id = sample-app-0.0.1
heartbeat.interval.ms = 3000
interceptor.classes = null
internal.leave.group.on.close = false
isolation.level = read_uncommitted
key.deserializer = class org.apache.kafka.common.serialization.ByteArrayDeserializer
max.partition.fetch.bytes = 1048576
max.poll.interval.ms = 2147483647
max.poll.records = 1000
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
partition.assignment.strategy = [org.apache.kafka.streams.processor.internals.StreamPartitionAssignor]
receive.buffer.bytes = 65536
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 305000
retry.backoff.ms = 100
sasl.jaas.config = null
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 60000
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.mechanism = GSSAPI
security.protocol = PLAINTEXT
send.buffer.bytes = 131072
session.timeout.ms = 10000
ssl.cipher.suites = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
ssl.endpoint.identification.algorithm = null
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.location = null
ssl.keystore.password = null
ssl.keystore.type = JKS
ssl.protocol = TLS
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.location = null
ssl.truststore.password = null
ssl.truststore.type = JKS
value.deserializer = class org.apache.kafka.common.serialization.ByteArrayDeserializer
But one of my application is throwing below exception after running some time even though previously committed offsets are available. What would be the possible reason? Please explain.
017-12-04 12:03:25,410 ERROR c.e.s.c.f.k.s.r.f.SampleStreamsApp [sample-app-0.0.1-096850bc-5f18-4c59-b0c7-63ad00e08aa1-StreamThread-1] No valid committed offset found for input topic sampel (partition 0) and no valid reset policy configured. You need to set configuration parameter "auto.offset.reset" or specify a topic specific reset policy via KStreamBuilder#stream(StreamsConfig.AutoOffsetReset offsetReset, ...) or KStreamBuilder#table(StreamsConfig.AutoOffsetReset offsetReset, ...) org.apache.kafka.streams.errors.StreamsException: No valid committed offset found for input topic sample (partition 0) and no valid reset policy configured. You need to set configuration parameter "auto.offset.reset" or specify a topic specific reset policy via KStreamBuilder#stream(StreamsConfig.AutoOffsetReset offsetReset, ...) or KStreamBuilder#table(StreamsConfig.AutoOffsetReset offsetReset, ...)
at org.apache.kafka.streams.processor.internals.StreamThread.resetInvalidOffsets(StreamThread.java:567)
at org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:538)
at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:490)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:480)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:457)
Caused by: org.apache.kafka.clients.consumer.NoOffsetForPartitionException: Undefined offset with no reset policy for partitions: [sample-0]
at org.apache.kafka.clients.consumer.internals.Fetcher.resetOffsets(Fetcher.java:425)
at org.apache.kafka.clients.consumer.internals.Fetcher.resetOffsetsIfNeeded(Fetcher.java:254)
at org.apache.kafka.clients.consumer.KafkaConsumer.updateFetchPositions(KafkaConsumer.java:1640)
at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1083)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1043)
at org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:536)
... 3 more