I am newbie to Kafka. I am working on a Spring-Kafka POC. Our KAFKA severs are Kerberized. With all required configuration, we are able to access the Kerberized Kafka server. Now we have another requirement where we have to consume topics from non-Kerberized (Simple Kafka Consumer) Kafka servers. Can we do this in single application by creating another KafkaConsumer with its own Listener?
Yes; just define a different consumer factory bean for the second consumer.
If you are using Spring Boot's auto configuration, you will have to manually declare both because the auto configuration is disabled if a user-defined bean is discovered.
Related
I am using java Spring-Boot framework and trying to prevent our consumer from creating topic in kafka by setting the config properties.
where Configurations are:
From broker side:
auto.create.topics.enable=true
From consumer side
auto.create.topics.enable=false
for consumer we made auto creation topic false where on broker it is true.
Above configs are not working for us,
and Also if We have any other ways to archive the same we can discuss.
auto.create.topics.enable is not a consumer config. It needs to be allow.auto.create.topics, but is only a valid option in kafka-clients version 2.3+
There may be other Spring related settings; refer latest comment thread here. Disable auto topic creation from Spring Kafka Consumer
I want to replace existing JMS message consumer in EJB with Apache Kafka message consumer. I am not able to figure out the option to configure Apache Kafka consumer with EJB configuration.
Kafka is not a straight messaging solution (comparable to say RabbitMQ), but can be used as one.
You will need to translate your JMS concepts (topics, queues) into Kafka topics (which are closer to JMS topics).
Also given that consumers have a configureable/storeable start offset you will need to define these policies for your consumers.
In the Spring Cloud website (https://spring.io/projects/spring-cloud-stream), are listed the binders options to use. And there we have the Apache Kafka and the Kafka Streams options.
What's the difference between them?
For what purpose we should choose between these two?
The Apache Kafka binder is used for basic kafka client usage consumer/producer api,
Kafka Stream binder is built upon the base apache kafka binder and adds the ability to use kafka streams api,
Kafka streams api is lightweight code libraries which gives you the functionality to manipulate data from topic/s in kafka to other topic/s in kafka , allow you to transform, enhance, filter,join, aggregate and more...
The Apache Kafka Binder implementation maps each destination to an Apache Kafka topic. The consumer group maps directly to the same Apache Kafka concept. Partitioning also maps directly to Apache Kafka partitions as well.
The binder currently uses the Apache Kafka kafka-clients version 2.3.1. This client can communicate with older brokers (see the Kafka documentation), but certain features may not be available. For example, with versions earlier than 0.11.x.x, native headers are not supported. Also, 0.11.x.x does not support the autoAddPartitions property
https://docs.spring.io/spring-cloud-stream-binder-kafka/docs/3.1.3/reference/html/spring-cloud-stream-binder-kafka.html#_apache_kafka_binder
Spring Cloud Stream includes a binder implementation designed explicitly for Apache Kafka Streams binding. With this native integration, a Spring Cloud Stream "processor" application can directly use the Apache Kafka Streams APIs in the core business logic.
Kafka Streams binder implementation builds on the foundations provided by the Spring for Apache Kafka project.
Kafka Streams binder provides binding capabilities for the three major types in Kafka Streams - KStream, KTable and GlobalKTable.
Kafka Streams applications typically follow a model in which the records are read from an inbound topic, apply business logic, and then write the transformed records to an outbound topic. Alternatively, a Processor application with no outbound destination can be defined as well.
https://docs.spring.io/spring-cloud-stream-binder-kafka/docs/3.1.3/reference/html/spring-cloud-stream-binder-kafka.html#_kafka_streams_binder
How can I enable Kafka source connector idempotency feature?
I know in confluent we can override producer configs by producer.* properties in the worker configuration, but how about Kafka itself? is it the same?
After setting these configs where can I see applied configs for my connect worker?
Confluent doesn't modify the base Kafka Connect properties.
For configuration of the producers used by Kafka source tasks and the consumers used by Kafka sink tasks, the same parameters can be used but need to be prefixed with producer. and consumer. respectively
Starting with 2.3.0, client configuration overrides can be configured individually per connector by using the prefixes producer.override. and consumer.override. for Kafka sources or Kafka sinks respectively
https://kafka.apache.org/documentation/#connect_running
However, Kafka Connect sources aren't idenpotent - KAFKA-7077 & KIP-308
After setting these configs where can I see applied configs for my connect worker
In the logs, it should show the ProducerConfig or ConsumerConfig when the tasks start
We are currently on HDF (Hortonworks Dataflow) 3.3.1 which bundles Kafka 2.0.0 and are trying to use Kafka Connect in distributed mode to launch a Google Cloud PubSub Sink connector.
We are planning on sending back some metadata into a Kafka Topic and need to integrate a Kafka producer into the flush() function of the Sink task java code.
Would this have a negative impact on the process where Kafka Connect commits back the offsets to Kafka (as we would be adding a overhead of running a Kafka producer before the flush).
Also, how does Kafka Connect get the Bootstrap servers list from the configuration when it is not specified in the Connector Properties for either the sink or the source? I need to use the same Bootstrap server list to start the producer.
Currently I am changing the config for the sink connector, adding bootstrap server list as a property and parsing it in the Java code for the connector. I would like to use bootstrap server list from the Kafka Connect worker properties if that is possible.
Kindly help on this.
Thanks in advance.
need to integrate a Kafka producer into the flush() function of the Sink task java code
There is no producer instance exposed in the SinkTask API...
Would this have a negative impact on the process where Kafka Connect commits back the offsets to Kafka (as we would be adding a overhead of running a Kafka producer before the flush).
I mean, you can add whatever code you want. As far as negative impacts go, that's up to you to benchmark on your own infrastructure. Obviously adding more blocking code makes the other processes slower overall
how does Kafka Connect get the Bootstrap servers list from the configuration when it is not specified in the Connector Properties for either the sink or the source?
Sinks and sources are not workers. Look at connect-distributed.properties
I would like to use bootstrap server list from the Kafka Connect worker properties if that is possible
It's not possible. Adding extra properties to the sink/source configs are the only way. (Feel free to make a Kafka JIRA requesting such a feature of exposing the worker configs, though)