MirrorMaker2 - Custom rename of topics with standalone connector - apache-kafka

I want to run MirrorMaker as a standalone connector.
So far I haven't found any documentation about the configuration.
As far as I imagine the following would replicate myTopic.
Now in the destination cluster I need for the topic to have another name foo (not the automatic rename).
Is this directly supported by MirrorSourceConnector or do I need some other means for that?
connector.class = org.apache.kafka.connect.mirror.MirrorSourceConnector
tasksMax = 2
topics = myTopic
source.cluster.bootstrap.servers = sourceHost:9092
target.cluster.bootstrap.servers = sinkHost:9092

So the Kafka Mirror Maker source code has a decent readme.md.
How you configure it is different depending on if you're running MM2 directly or in Kafka Connect. You said directly, which is in the linked readme.md.
Basically:
By default, replicated topics are renamed based on "source cluster
aliases":
topic-1 --> source.topic-1
This can be customized by overriding the replication.policy.separator
property (default is a period). If you need more control over how
remote topics are defined, you can implement a custom
ReplicationPolicy and override replication.policy.class (default is
DefaultReplicationPolicy).
This unfortunately means that you cannot rename the topic through configuration code alone. (The DefaultReplicationPolicy class only allows you to specify the separator and nothing else). This is probably because when you specify the topics to mirror you use a regular expression, and not a single topic name (even if your source cluster topic config property is just the name of the topic - it's still treated like a regular expression).
So, back to the docs: ReplicationPolicy is a Java interface in the Kafka connect source code, so you would need to implement a custom Java class that implements ReplicationPolicy and then ensure it is on the classpath when you run MM2.
Let's imagine you do write such a class and you call it com.moffatt.kafka.connect.mirror.FooReplicationPolicy. A good template for your class is the default (and apparently only) replication policy class that comes with Kafka Connect: DefaultReplicationPolicy. You can see that building your own would not be too difficult. You could easily add a Map - either hard-coded or configured - that looks for specific configured topic names and maps it to the target topic name.
You use your new class by specifying it in the config as:
replication.policy.class = com.moffatt.kafka.connect.mirror.FooReplicationPolicy

Related

Can I tell spring to use compatibility=NONE when auto-registering schemas for a kafka topic

We are using kafka topics for our micro-services to communicate with each other. We are also using schemas to define the content of our topics.
On our various stages we explicitely deploy the schemas for each topic as part of the deployment process. However, on our local developer laptops (where we have a docker container running a local kafka and schema-registry instance) we do not want to do this.
We are using Spring-Boot and spring-kafka.
Accordingly, we have the following two config files:
application.yml
spring.kafka.producer.properties.auto.register.schemas=false
*application-local.yml
spring.kafka.producer.properties.auto.register.schemas=true
This works well, our schemas are automatically registered with the local schema-registry when we write to a kafka-topic for the first time.
However, after we've made some schema changes, our publish now fails telling us that the new schema is not compatible with the previously installed schema. Checking the local schema registry, we see that the auto-registered schema was registered with compatibility=BACKWARD whereas on our staged registries we work with compatibility=NONE (we're well aware of the issues this may bring with regard to breaking changes -> this is handled in the way we work with our data).
Is there any way to make the auto-registration use NONE instead of BACKWARD?
Any new subject will inherit the global compatibility level of the Registry; you cannot set it when registering a schema without making a secondary out-of-band compatibility HTTP request (in other words, prior to actually producing any data which may register the schema on its own).
During local development, I would suggest deleting the schema from your registry until you are content with the schema rather than worrying about local compatibility changes.
You could also set the default compatibility of the container to NONE

Is there a way of telling a sink connector in Kafka Connect how to look for schema entries

I have successfully set up Kafka Connect in distributed mode locally with the Confluent BigQuery connector. The topics are being made available to me by another party; I am simply moving these topics into my Kafka Connect on my local machine, and then to the sink connector (and thus into BigQuery).
Because of the topics being created by someone else, the schema registry is also being managed by them. So in my config, I set "schema.registry.url":https://url-to-schema-registry, but we have multiple topics which all use the same schema entry, which is located at, let's say, https://url-to-schema-registry/subjects/generic-entry-value/versions/1.
What is happening, however, is that Connect is looking for the schema entry based on the topic name. So let's say my topic is my-topic. Connect is looking for the entry at this URL: https://url-to-schema-registry/subjects/my-topic-value/versions/1. But instead, I want to use the entry located at https://url-to-schema-registry/subjects/generic-entry-value/versions/1, and I want to do so for any and all topics.
How can I make this change? I have tried looking at this doc: https://docs.confluent.io/platform/current/schema-registry/serdes-develop/index.html#configuration-details as well as this class: https://github.com/confluentinc/schema-registry/blob/master/schema-serializer/src/main/java/io/confluent/kafka/serializers/subject/TopicRecordNameStrategy.java
but this looks to be a config parameter for the schema registry itself (which I have no control over), not the sink connector. Unless I'm not configuring something correctly.
Is there a way for me to configure my sink connector to look for a specified schema entry like generic-entry-value/versions/..., instead of the default format topic-name-value/versions/...?
The strategy is configurable at the connector level.
e.g.
value.converter.value.subject.name.strategy=...
There are only strategies built-in, however for Topic and/or RecordName lookups. You'll need to write your own class for static lookups from "generic-entry" if you otherwise cannot copy this "generic-entry-value" schema into new subjects
e.g
# get output of this to a file
curl ... https://url-to-schema-registry/subjects/generic-entry-value/versions/1/schema
# upload it again where "new-entry" is the name of the other topic
curl -XPOST -d #schema.json https://url-to-schema-registry/subjects/new-entry-value/versions

Where can I find info about possible parameters to put in config in a json connector file?

What are the possible parameters to be passed on a config of json connector file?
Where can i find info for create my own json connectors files?
The only properties that are valid for all connectors (not the workers) include following
name
connector.class
key / value .converter
tasks.max
(among others)
Section - https://kafka.apache.org/documentation/#connectconfigs
Scroll down to see differences in worker configs and source/sink properties
Beyond that, each connector.class has its own possible configuration values that should be documented elsewhere. For example, Confluent Hub links to specific connector property pages if you are searching in there.
If you are trying to create your own Connector, then you would have a configure method where the properties are defined

Kafka Connect configuration and the "consumer." prefix

I was hoping to get some clarification on the kafka connect configuration properties here https://docs.confluent.io/current/connect/userguide.html
We were having issues connecting to our confluent connect cluster to our kafka connect instance. We had all our settings configured correctly from what i could tell and didn’t have any luck.
After extensive googling some discovered that prefixing the configuration properties with “consumer.” seems to fix the issue. There is a mention of that prefix here https://docs.confluent.io/current/connect/userguide.html#overriding-producer-and-consumer-settings
I am having a hard time understanding wrapping my head around the prefix and how the properties are picked up by connect and used. It was my assumption that the java api client used by kafka connect will pick up the connection properties from the properties file. It might have some hard coded configuration properties that can be overridden by specifying the values in the properties file. But, this is not correct? The doc linked above mentions
All new producer configs and new consumer configs can be overridden by prefixing them with producer. or consumer.
What are the new configs? The link on that page just takes me to the list of all the configs. The doc mentions
Occasionally, you may have an application that needs to adjust the default settings. One example is a standalone process that runs a log file connector
that as the use case for using the prefix override, but this is connect cluster, how does that use case apply? Appreciate your time if you have read thus far
The new prefix is probably misleading. Apache Kafka is currently at version 2.3, and back in 0.8 and 0.9 a "new" producer and consumer API was added. These are now just the standard producer and consumer, but the new prefix has hung around.
In terms of overriding configuration, it is as you say; you can prefix any of the standard consumer/producer configs in the Kafka Connect worker with consumer. (for a sink) or producer. (for a source).
Note that as of Apache Kafka 2.3 you can also override these per connector, as detailed in this post : https://www.confluent.io/blog/kafka-connect-improvements-in-apache-kafka-2-3
The Post is too old, but I'll answer it for people who will face he same difficulty:
New properties, they would like to say : any custom consumer or producer configs.
And there is two levels :
Worker side : the worker has a consumer to read configs, status and offsets of each connector and has a producer (to write status and offsets) [not confuse with __consumer_offsets topics : offset topic is only for source connector], so to override those configs:
consumer.* (example: consumer.max.poll.records=10)
producer.* (example: producer.batch.size=10000)
Connector Side : this one will inherit the worker config by default, and to override consumer/producer configs, we should use :
consumer.override.* (example: consumer.override.max.poll.records=100)
producer.override* (example: producer.override.batch.size=20000)

Can a Kafka Connector load its own name?

According to Kafka Documentation
Connector configurations are simple key-value mappings. For standalone
mode these are defined in a properties file and passed to the Connect
process on the command line.
Most configurations are connector dependent, so they can't be outlined
here. However, there are a few common options:
name - Unique name for the connector. Attempting to register again with the same name will fail.
I have 10 connectors running in standalone mode like this:
bin/connect-standalone.sh config/connect-standalone.properties connector1.properties connector2.properties ...
My question is can a connector load its own name at runtime?
Thanks in advance.
Yes, you can get the name of the connector at runtime.
When connector starts all properties are passed to Connector::start(Map<String, String> props). Connector can read those properties, validate them, save and later pass to Task. It depends on Connector implementation if he use it or not.
Connector name property is name.