How to set replication factor in librdkafka? - apache-kafka

I'm using librdkafka to develop in C++ kafka message producer.
Is there a way to create topic with custom replication factor, different than default one?
CONFIGURATION.md does not mention explicitly any parameter, but Kafka tools allow for this.

While auto topic creation is currently supported by librdkafka, it merely uses the broker's topic default configuration.
What you need is manual topic creation from the client. The broker support for this was recently added in KIP-4, which is also supported through librdkafka's Admin API.
See the rd_kafka_CreateTopics() API.

Related

How do i set up Apache Kafka's Replication Factor?

I was just wondering how can i set replication factor in apache kafka? I can't find a good tutorial out there about this, I'm learning this for a mini project.
If you have any links, Please link it down
In the broker properties, you define both internal and default auto-created topic replication factors. Ideally, you'd disable auto creation, however
Any other topic creation API requires replication factor to be specified, so refer any documentation you can find, starting with the official Kafka website

Micronaut kafka : unable to use exactly once kafka message semantics

I am using micronaut kafka to set up my producer. I am using the #KafkaClient annotation to set up all the producer config
Micronaut kafka enables me set all the parameters to set up a transactional producer.
When I push the message, i get back an exception saying
io.micronaut.messaging.exceptions.MessagingClientException: Exception sending producer record: Cannot perform 'send' before completing a call to initTransactions when transactions are enabled.
Referring back to the mirconaut documentation section looks like it is asking you to use KafkaProducer API to implement this feature.
From what I can assess KafkaProducer.initTransactions() method needs to be invoked before starting transactions and doesn't look that it is happening.
Has anyone faced a similar issue implementing this?
I guess, you are using single-node-cluster for development right? If it is so, you should configure transaction.state.log.min.isr=1 and transaction.state.log.replication.factor=1 on your local cluster. They are all preconfigured with 3 by default.
There is also a section from confluent https://docs.confluent.io/current/streams/developer-guide/config-streams.html
processing.guarantee
The processing guarantee that should be used. Possible values are "at_least_once" (default) and "exactly_once". Note that if exactly-once processing is enabled, the default for parameter commit.interval.ms changes to 100ms. Additionally, consumers are configured with isolation.level="read_committed" and producers are configured with retries=Integer.MAX_VALUE and enable.idempotence=true per default. Note that "exactly_once" processing requires a cluster of at least three brokers by default, which is the recommended setting for production. For development, you can change this by adjusting the broker settings in both transaction.state.log.replication.factor and transaction.state.log.min.isr to the number of brokers you want to use.

How to override the Kafka Topic configurations in MongoDB Source Connector?

I am using MongoDB Source Connector to get the data from a MongoDB collection into Kafka. What this connector does is that it automatically creates a topic using the following naming convention:
[prefix_provided_in_the_connector_properties].[db_name].[collection_name]
In the MongoDB Source Connector's documentation, there is no mention of overriding the topic configuration such as number of partitions or replication factor. I have the following questions:
Is it possible to override the topic configs in the connector.properties file?
If not, is it then done on Kafka's end? If so, can we individually configure each topics' settings or it will globally affect all the topics?
Thank you!
Sounds like you have auto.create.topics.enable=true on your brokers. It is recommended to disable this and enforce manual topic creation.
Connect only creates internal topics for itself. Source connectors should ideally have their topics created ahead of time, otherwise, you get the defaults set in the broker server.properties. Changing the values will not change existing topics

Kafka design questions - Kafka Connect vs. own consumer/producer

I need to understand when to use Kafka connect vs. own consumer/producer written by developer. We are getting Confluent Platform. Also to achieve fault tolerant design do we have to run the consumer/producer code ( jar file) from all the brokers ?
Kafka connect is typically used to connect external sources to Kafka i.e. to produce/consume to/from external sources from/to Kafka.
Anything that you can do with connector can be done through
Producer+Consumer
Readily available Connectors only ease connecting external sources to Kafka without requiring the developer to write the low-level code.
Some points to remember..
If the source and sink are both the same Kafka cluster, Connector doesn't make sense
If you are doing changed-data-capture (CDC) from a database and push them to Kafka, you can use a Database source connector.
Resource constraints: Kafka connect is a separate process. So double check what you can trade-off between resources and ease of development.
If you are writing your own connector, it is well and good, unless someone has not already written it. If you are using third-party connectors, you need to check how well they are maintained and/or if support is available.
do we have to run the consumer/producer code ( jar file) from all the brokers ?
Don't run client code on the brokers. Let all memory and disk access be reserved for the broker process.
when to use Kafka connect vs. own consumer/produce
In my experience, these factors should be taken into consideration
You're planning on deploying and monitoring Kafka Connect anyway, and have the available resources to do so. Again, these don't run on the broker machines
You don't plan on changing the Connector code very often, because you must restart the whole connector JVM, which would be running other connectors that don't need restarted
You aren't able to integrate your own producer/consumer code into your existing applications or simply would rather have a simpler produce/consume loop
Having structured data not tied to the a particular binary format is preferred
Writing your own or using a community connector is well tested and configurable for your use cases
Connect has limited options for fault tolerance compared to the raw producer/consumer APIs, with the drawbacks of more code, depending on other libraries being used
Note: Confluent Platform is still the same Apache Kafka
Kafka Connect:
Kafka Connect is an open-source platform which basically contains two types: Sink and Source. The Kafka Connect is used to fetch/put data from/to a database to/from Kafka. The Kafka connect helps to use various other systems with Kafka. It also helps in tracking the changes (as mentioned in one of the answers Changed Data Capture (CDC) ) from DB's to Kafka. The system maintains the offset, in order to read/write data from that particular offset to Kafka or any other database.
For more details, you can refer to https://docs.confluent.io/current/connect/index.html
The Producer/Consumer:
The Producer and Consumer are just an end system, which use the Kafka to produce and consume topics to/from Kafka. They are used where we want to broadcast the data to various consumers in a consumer group. This kind of system also maintains the lag and offsets of data for the consumer groups.
No, you don't need to run any producer/consumer while running Kafka connect. In case you want to check there is no data loss you can run the consumer while running Source Connectors. In case, of Sink Connectors, the already produced data can be verified in your database, by running their particular select queries.

Kafka Connect configuration and the "consumer." prefix

I was hoping to get some clarification on the kafka connect configuration properties here https://docs.confluent.io/current/connect/userguide.html
We were having issues connecting to our confluent connect cluster to our kafka connect instance. We had all our settings configured correctly from what i could tell and didn’t have any luck.
After extensive googling some discovered that prefixing the configuration properties with “consumer.” seems to fix the issue. There is a mention of that prefix here https://docs.confluent.io/current/connect/userguide.html#overriding-producer-and-consumer-settings
I am having a hard time understanding wrapping my head around the prefix and how the properties are picked up by connect and used. It was my assumption that the java api client used by kafka connect will pick up the connection properties from the properties file. It might have some hard coded configuration properties that can be overridden by specifying the values in the properties file. But, this is not correct? The doc linked above mentions
All new producer configs and new consumer configs can be overridden by prefixing them with producer. or consumer.
What are the new configs? The link on that page just takes me to the list of all the configs. The doc mentions
Occasionally, you may have an application that needs to adjust the default settings. One example is a standalone process that runs a log file connector
that as the use case for using the prefix override, but this is connect cluster, how does that use case apply? Appreciate your time if you have read thus far
The new prefix is probably misleading. Apache Kafka is currently at version 2.3, and back in 0.8 and 0.9 a "new" producer and consumer API was added. These are now just the standard producer and consumer, but the new prefix has hung around.
In terms of overriding configuration, it is as you say; you can prefix any of the standard consumer/producer configs in the Kafka Connect worker with consumer. (for a sink) or producer. (for a source).
Note that as of Apache Kafka 2.3 you can also override these per connector, as detailed in this post : https://www.confluent.io/blog/kafka-connect-improvements-in-apache-kafka-2-3
The Post is too old, but I'll answer it for people who will face he same difficulty:
New properties, they would like to say : any custom consumer or producer configs.
And there is two levels :
Worker side : the worker has a consumer to read configs, status and offsets of each connector and has a producer (to write status and offsets) [not confuse with __consumer_offsets topics : offset topic is only for source connector], so to override those configs:
consumer.* (example: consumer.max.poll.records=10)
producer.* (example: producer.batch.size=10000)
Connector Side : this one will inherit the worker config by default, and to override consumer/producer configs, we should use :
consumer.override.* (example: consumer.override.max.poll.records=100)
producer.override* (example: producer.override.batch.size=20000)