How to pass Apache Kafka Mirrormaker2 config for the producer - apache-kafka

I am currently testing Mirrormaker to replicate data between two clusters. Unfortunately it seems the producer config is not utilized by the individual producers then as documented in https://github.com/apache/kafka/blob/trunk/connect/mirror/README.md.
My configuration file simplified:
clusters=INPUT,BACKUP
INPUT.consumer.compression.type=lz4
BACKUP.producer.compression.type=lz4
INPUT->BACKUP.enabled = true
INPUT->BACKUP.topics=mytopic.*
...
Then the log output when running mirrormaker2 (connect-mirror-maker.sh mirrormaker.properties) does not show this option:
INFO ProducerConfig values:
...
compression.type = none
...
The Kafka version in use is 2.7.1.
How can I pass the settings correctly, so the producer is correctly compressing? I also need to pass a few other settings, but once this works it should do for the other settings too.

Two potential solutions:
Enable connector.client.config.override.policy in mm2 workers' property file. You need to follow https://docs.confluent.io/platform/current/connect/references/allconfigs.html#override-the-worker-configuration closely.
Launch a Kafka Connect cluster and create MirrorSourceConnector and MirrorCheckpointConnector one by one with producer configs overridden. You will still need to refer to the official Confluent documentation above. I picked this approach and it works.

Related

Enable kafka source connector idempotency

How can I enable Kafka source connector idempotency feature?
I know in confluent we can override producer configs by producer.* properties in the worker configuration, but how about Kafka itself? is it the same?
After setting these configs where can I see applied configs for my connect worker?
Confluent doesn't modify the base Kafka Connect properties.
For configuration of the producers used by Kafka source tasks and the consumers used by Kafka sink tasks, the same parameters can be used but need to be prefixed with producer. and consumer. respectively
Starting with 2.3.0, client configuration overrides can be configured individually per connector by using the prefixes producer.override. and consumer.override. for Kafka sources or Kafka sinks respectively
https://kafka.apache.org/documentation/#connect_running
However, Kafka Connect sources aren't idenpotent - KAFKA-7077 & KIP-308
After setting these configs where can I see applied configs for my connect worker
In the logs, it should show the ProducerConfig or ConsumerConfig when the tasks start

Retrieving Kafka Producer config

Is there a way to collect Kafka Producer configs from the Kafka cluster?
I know that these settings are stored on the client itself.
I am interested at least in client.id and a collection of topics that the producer is publishing to.
There is no such tool provided by Apache Kafka (or Confluent) to acquire this information.
I worked on a team that built a tool called the Stream Registry that did provide a centralized location for this information
May be you can have a look into kafkacat.github url
We find it very helpful in troubleshooting, kafka issues.

If i set configuration at broker level and topic level, which takes precedence?

As per the apache kakfa documentation, it seems i've an option to set 'min.insync.replicas' at broker level and topic level. Now my question is - If i set 'min.insync.replicas' at broker level and topic level, which takes precedence?
The configuration at broker level having a relation to a topic serves as default values. That means, if you create a topic without specifying topic configuration it will fall back to the configuration at broker level.
For example, topics that are automatically created (through auto.create.topics.enable) will use the configuration at broker level.
The details are given in the Kafka documentation in section Topic-Level Configs
Configurations pertinent to topics have both a server default as well an optional per-topic override. If no per-topic configuration is given the server default is used. The override can be set at topic creation time by giving one or more --config options.

How to override the Kafka Topic configurations in MongoDB Source Connector?

I am using MongoDB Source Connector to get the data from a MongoDB collection into Kafka. What this connector does is that it automatically creates a topic using the following naming convention:
[prefix_provided_in_the_connector_properties].[db_name].[collection_name]
In the MongoDB Source Connector's documentation, there is no mention of overriding the topic configuration such as number of partitions or replication factor. I have the following questions:
Is it possible to override the topic configs in the connector.properties file?
If not, is it then done on Kafka's end? If so, can we individually configure each topics' settings or it will globally affect all the topics?
Thank you!
Sounds like you have auto.create.topics.enable=true on your brokers. It is recommended to disable this and enforce manual topic creation.
Connect only creates internal topics for itself. Source connectors should ideally have their topics created ahead of time, otherwise, you get the defaults set in the broker server.properties. Changing the values will not change existing topics

Kafka Connect configuration and the "consumer." prefix

I was hoping to get some clarification on the kafka connect configuration properties here https://docs.confluent.io/current/connect/userguide.html
We were having issues connecting to our confluent connect cluster to our kafka connect instance. We had all our settings configured correctly from what i could tell and didn’t have any luck.
After extensive googling some discovered that prefixing the configuration properties with “consumer.” seems to fix the issue. There is a mention of that prefix here https://docs.confluent.io/current/connect/userguide.html#overriding-producer-and-consumer-settings
I am having a hard time understanding wrapping my head around the prefix and how the properties are picked up by connect and used. It was my assumption that the java api client used by kafka connect will pick up the connection properties from the properties file. It might have some hard coded configuration properties that can be overridden by specifying the values in the properties file. But, this is not correct? The doc linked above mentions
All new producer configs and new consumer configs can be overridden by prefixing them with producer. or consumer.
What are the new configs? The link on that page just takes me to the list of all the configs. The doc mentions
Occasionally, you may have an application that needs to adjust the default settings. One example is a standalone process that runs a log file connector
that as the use case for using the prefix override, but this is connect cluster, how does that use case apply? Appreciate your time if you have read thus far
The new prefix is probably misleading. Apache Kafka is currently at version 2.3, and back in 0.8 and 0.9 a "new" producer and consumer API was added. These are now just the standard producer and consumer, but the new prefix has hung around.
In terms of overriding configuration, it is as you say; you can prefix any of the standard consumer/producer configs in the Kafka Connect worker with consumer. (for a sink) or producer. (for a source).
Note that as of Apache Kafka 2.3 you can also override these per connector, as detailed in this post : https://www.confluent.io/blog/kafka-connect-improvements-in-apache-kafka-2-3
The Post is too old, but I'll answer it for people who will face he same difficulty:
New properties, they would like to say : any custom consumer or producer configs.
And there is two levels :
Worker side : the worker has a consumer to read configs, status and offsets of each connector and has a producer (to write status and offsets) [not confuse with __consumer_offsets topics : offset topic is only for source connector], so to override those configs:
consumer.* (example: consumer.max.poll.records=10)
producer.* (example: producer.batch.size=10000)
Connector Side : this one will inherit the worker config by default, and to override consumer/producer configs, we should use :
consumer.override.* (example: consumer.override.max.poll.records=100)
producer.override* (example: producer.override.batch.size=20000)