i'm new to kafka. I have created a kafka consumer with spring boot (spring-kafka dependency). In my app i have used consumerFactory and producerfactory beans for config. So in my application i have created the kafka consumer like below.
#RetryableTopic(
attempts = "3",
backoff = #Backoff(delay = 1000, multiplier = 2.0),
autoCreateTopics = "false")
#KafkaListener(topics = "myTopic", groupId = "myGroupId")
public void consume(#Payload(required = false) String message) {
processMessage(message);
}
My configs are like below
#Bean
public ConsumerFactory<String, Object> consumerFactory() {
Map<String, Object> config = new HashMap<>();
config.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, env.getProperty("kafka.consumer.bootstrap.servers"));
config.put(ConsumerConfig.GROUP_ID_CONFIG, env.getProperty("kafka.consumer.group"));
config.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
config.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
return new DefaultKafkaConsumerFactory<>(config);
}
#Bean
public ConcurrentKafkaListenerContainerFactory<String, Object> kafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory<String, Object> factory =
new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory());
factory.getContainerProperties().setCommitLogLevel(LogIfLevelEnabled.Level.DEBUG);
factory.getContainerProperties().setMissingTopicsFatal(false);
return factory;
}
#Bean
public ProducerFactory<String, String> producerFactory() {
Map<String, Object> config = new HashMap<>();
config.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, env.getProperty("kafka.consumer.bootstrap.servers"));
config.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
config.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
return new DefaultKafkaProducerFactory<>(config);
}
#Bean
public KafkaTemplate<String, String> kafkaTemplate() {
return new KafkaTemplate<>(producerFactory());
}
So i want to consume parallelly since i may get more messages. What i found about consuming parallelly topics is that i need to create multiple partitions for a topic and i need to create a consumer for each partition. Let´s say i have 10 partitions for my topic, then i can have 10 consumers in the same consumer group reading one partition each. I understand this behavior. But my concern is how can i create several consumers in my application.
Do i have to write multiple kafka consumer using #KafkaListener with the same functionality ? In that case do i have to write below method X amount of times if i need X amount of same consumers.
#RetryableTopic(
attempts = "3",
backoff = #Backoff(delay = 1000, multiplier = 2.0),
autoCreateTopics = "false")
#KafkaListener(topics = "myTopic", groupId = "myGroupId")
public void consume(#Payload(required = false) String message) {
processMessage(message);
}
What are the options or configs that i need to achieve parallel consuming with multiple consumers ?
Thank you in advance.
The #KafkaListener has this option:
/**
* Override the container factory's {#code concurrency} setting for this listener. May
* be a property placeholder or SpEL expression that evaluates to a {#link Number}, in
* which case {#link Number#intValue()} is used to obtain the value.
* <p>SpEL {#code #{...}} and property place holders {#code ${...}} are supported.
* #return the concurrency.
* #since 2.2
*/
String concurrency() default "";
See more in docs: https://docs.spring.io/spring-kafka/reference/html/#kafka-listener-annotation
Related
I am writing a Kafka Streams application, and I would like to include two application id in this application, but I keep getting error saying that "Topology with no input topics will create no stream threads and no global thread, must subscribe to at least one source topic or global table." Could you please let me know where I made a mistake? Thank you so much!
public class KafkaStreamsConfigurations {
...
#Bean(name = KafkaStreamsDefaultConfiguration.DEFAULT_STREAMS_CONFIG_BEAN_NAME)
#Primary
public KafkaStreamsConfiguration kStreamsConfigs() {
Map<String, Object> props = new HashMap<>();
setDefaults(props);
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "default");
return new KafkaStreamsConfiguration(props);
}
public void setDefaults(Map<String, Object> props) {...}
#Bean("snowplowStreamBuilder")
public StreamsBuilderFactoryBean streamsBuilderFactoryBean() {
Map<String, Object> props = new HashMap<>();
setDefaults(props);
...
props.put(StreamsConfig.NUM_STREAM_THREADS_CONFIG, 0);
props.put(StreamsConfig.REPLICATION_FACTOR_CONFIG, 1);
Properties properties = new Properties();
props.forEach(properties::put);
StreamsBuilderFactoryBean streamsBuilderFactoryBean = new StreamsBuilderFactoryBean();
streamsBuilderFactoryBean.setStreamsConfiguration(properties);
return streamsBuilderFactoryBean;
}
}
Here is my application class.
public class SnowplowStreamsApp {
#Bean("snowplowStreamsApp")
public KStream<String, String> [] startProcessing(
#Qualifier("snowplowStreamBuilder") StreamsBuilder builder) {
KStream<String, String>[] branches = builder.stream(inputTopicPubsubSnowplow, Consumed
.with(Serdes.String(), Serdes.String()))
.mapValues(snowplowEnrichedGoodDataFormatter::formatEnrichedData)
.branch(...);
return branches;
}
}
Name your factory bean DEFAULT_STREAMS_BUILDER_BEAN_NAME instead of snowplowStreamBuilder - otherwise, the default factory bean will be started with no defined streams.
I followed "Intro to Apache Kafka with Spring" tutorial by baeldung.com.
I set up a KafkaConsumerConfig class with the kafkaConsumerFactory method:
private ConsumerFactory<String, String> kafkaConsumerFactory(String groupId) {
Map<String, Object> props = new HashMap<>();
...
props.put(ConsumerConfig.GROUP_ID_CONFIG, groupId);
...
return new DefaultKafkaConsumerFactory<>(props);
}
and two "custom" factories:
#Bean
public ConcurrentKafkaListenerContainerFactory<String, String> fooKafkaListenerContainerFactory() {
return kafkaListenerContainerFactory("foo");
}
#Bean
public ConcurrentKafkaListenerContainerFactory<String, String> barKafkaListenerContainerFactory() {
return kafkaListenerContainerFactory("bar");
}
In the MessageListener class, instead I used #KafkaListener annotation to register consumers with the given groupId to listen on a topic:
#KafkaListener(topics = "${message.topic.name}", groupId = "foo", containerFactory = "fooKafkaListenerContainerFactory")
public void listenGroupFoo(String message) {
System.out.println("Received Message in group 'foo': " + message);
...
}
#KafkaListener(topics = "${message.topic.name}", groupId = "bar", containerFactory = "barKafkaListenerContainerFactory")
public void listenGroupBar(String message) {
System.out.println("Received Message in group 'bar': " + message);
...
}
In this way there are two group of consumers, the ones having groupId "foo" and the ones having groupId "bar".
Now if I change container factory for the "foo" consumers from fooKafkaListenerContainerFactory to barKafkaListenerContainerFactory in this way
#KafkaListener(topics = "${message.topic.name}", groupId = "foo", containerFactory = "barKafkaListenerContainerFactory")
public void listenGroupFoo(String message) {
...
}
It seems an incompatibility between groupId of KafkaListener and groupId of container factory but nothing changes.
So, what I'm trying to understand is what props.put(ConsumerConfig.GROUP_ID_CONFIG, groupId);property does and why it seem is not considered.
The factory groupId is a default which is only used if there is no groupId (or id) on the #KafkaListener.
In early versions, it was only possible to set the groupId on the factory, which meant you needed a separate factory for each listener if different groups are needed, which defeats the idea of a factory that can be used for multiple listeners.
See the javadocs...
/**
* Override the {#code group.id} property for the consumer factory with this value
* for this listener only.
* <p>SpEL {#code #{...}} and property place holders {#code ${...}} are supported.
* #return the group id.
* #since 1.3
*/
String groupId() default "";
/**
* When {#link #groupId() groupId} is not provided, use the {#link #id() id} (if
* provided) as the {#code group.id} property for the consumer. Set to false, to use
* the {#code group.id} from the consumer factory.
* #return false to disable.
* #since 1.3
*/
boolean idIsGroup() default true;
I need to read message from topic1 completely and then read message from topic2. I will be receiving messages in these topic everyday once. I managed to stop reading messages from topic2 before reading all the messages in topic1, but this is happening for me only once when the server is started. Can someone help me with this scenario.
ListenerConfig code
#EnableKafka
#Configuration
public class ListenerConfig {
#Value("${spring.kafka.bootstrap-servers}")
private String bootstrapServers;
#Bean
public Map<String, Object> consumerConfigs() {
Map<String, Object> props = new HashMap<>();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
props.put(ConsumerConfig.GROUP_ID_CONFIG, "batch");
props.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, "5");
return props;
}
#Bean
public ConsumerFactory<String, String> consumerFactory() {
return new DefaultKafkaConsumerFactory<>(consumerConfigs());
}
#Bean
public ConcurrentKafkaListenerContainerFactory<String, String> kafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory<String, String> factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory());
factory.setBatchListener(true);
return factory;
}
#Bean("kafkaListenerContainerTopic1Factory")
public ConcurrentKafkaListenerContainerFactory<String, String> kafkaListenerContainerTopic1Factory() {
ConcurrentKafkaListenerContainerFactory<String, String> factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory());
factory.getContainerProperties().setIdleEventInterval(60000L);
factory.setBatchListener(true);
return factory;
}
#Bean("kafkaListenerContainerTopic2Factory")
public ConcurrentKafkaListenerContainerFactory<String, String> kafkaListenerContainerTopic2Factory() {
ConcurrentKafkaListenerContainerFactory<String, String> factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory());
factory.setBatchListener(true);
return factory;
}
}
Listner code
#Service
public class Listener {
private static final Logger LOG = LoggerFactory.getLogger(Listener.class);
#Autowired
private KafkaListenerEndpointRegistry registry;
#KafkaListener(id = "first-listener", topics = "topic1", containerFactory = "kafkaListenerContainerTopic1Factory")
public void receive(#Payload List<String> messages,
#Header(KafkaHeaders.RECEIVED_PARTITION_ID) List<Integer> partitions,
#Header(KafkaHeaders.OFFSET) List<Long> offsets) {
for (int i = 0; i < messages.size(); i++) {
LOG.info("received first='{}' with partition-offset='{}'",
messages.get(i), partitions.get(i) + "-" + offsets.get(i));
}
}
#KafkaListener(id = "second-listener", topics = "topic2", containerFactory = "kafkaListenerContaierTopic2Factory" , autoStartup="false" )
public void receiveRel(#Payload List<String> messages,
#Header(KafkaHeaders.RECEIVED_PARTITION_ID) List<Integer> partitions,
#Header(KafkaHeaders.OFFSET) List<Long> offsets) {
for (int i = 0; i < messages.size(); i++) {
LOG.info("received second='{}' with partition-offset='{}'",
messages.get(i), partitions.get(i) + "-" + offsets.get(i));
}
}
#EventListener()
public void eventHandler(ListenerContainerIdleEvent event) {
LOG.info("Inside event");
this.registry.getListenerContainer("second-listener").start();
}
Kindly help me in resolving , as this cycle should happen everyday. Reading topic1 message completely and then reading message from topic2.
You are already using an idle event listener to start the second listener - it should also stop the first listener.
When the second listener goes idle; stop it.
You should be checking which container the event is for to decide which container to stop and/or start.
Then, using a TaskScheduler, schedule a start() of the first listener at the next time you want it to start.
Topic in Kafka is an abstraction where stream of records are published. Streams are naturally unbounded, so they have a start but they do not have a defined end. For your case, first you need to clearly define what is the end of your topic1 and your topic2 so that you can stop/presume your consumers when needed. Maybe you know how many messages you will process for each topic, so you can use: position or commmited to stop one consumer and presume the other one in that moment. Or if you are using a streaming framework they usually have a session window where the framework detects a groups elements by sessions of activity. You can also prefer to put that logic into the application side so that you don't need to stop/start any consumer threads.
I am using Spring kafka transaction for my producer and consumer applications.
The requirement is on producer side there are multiple steps: send message to kafka and then save to db. If save to db failed want to rollback the message send to kafka as well.
So on the consumer side, i set the isolation.leve to read_committed, then if the message is rollback from kafka, the consumer shouldn't read it.
Code for Producer application is:
#Configuration
#EnableKafka
public class KafkaConfiguration {
#Bean
public ProducerFactory<String, Customer> producerFactory() {
DefaultKafkaProducerFactory<String, Customer> pf = new DefaultKafkaProducerFactory<>(producerConfigs());
pf.setTransactionIdPrefix("customer.txn.tx-");
return pf;
}
#Bean
public Map<String, Object> producerConfigs() {
Map<String, Object> props = new HashMap<>();
// create a minimum Producer configs
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "http://127.0.0.1:9092");
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, KafkaAvroSerializer.class);
props.put("schema.registry.url", "http://127.0.0.1:8081");
// create safe Producer
props.put(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, "true");
props.put(ProducerConfig.ACKS_CONFIG, "all");
props.put(ProducerConfig.RETRIES_CONFIG, Integer.toString(Integer.MAX_VALUE));
props.put(ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION, "5"); // kafka 2.0 >= 1.1 so we can keep this as 5. Use 1 otherwise.
// high throughput producer (at the expense of a bit of latency and CPU usage)
props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "snappy");
props.put(ProducerConfig.LINGER_MS_CONFIG, "20");
props.put(ProducerConfig.BATCH_SIZE_CONFIG, Integer.toString(32 * 1024)); // 32 KB batch size
return props;
}
#Bean
public KafkaTemplate<String, Customer> kafkaTemplate() {
return new KafkaTemplate<>(producerFactory());
}
#Bean
public KafkaTransactionManager kafkaTransactionManager(ProducerFactory<String, Customer> producerFactory) {
KafkaTransactionManager<String, Customer> ktm = new KafkaTransactionManager<>(producerFactory);
ktm.setTransactionSynchronization(AbstractPlatformTransactionManager.SYNCHRONIZATION_ON_ACTUAL_TRANSACTION);
return ktm;
}
#Bean
#Primary
public JpaTransactionManager jpaTransactionManager(EntityManagerFactory entityManagerFactory) {
return new JpaTransactionManager(entityManagerFactory);
}
#Bean(name = "chainedTransactionManager")
public ChainedTransactionManager chainedTransactionManager(JpaTransactionManager jpaTransactionManager,
KafkaTransactionManager kafkaTransactionManager) {
return new ChainedTransactionManager(kafkaTransactionManager, jpaTransactionManager);
}
}
#Component
#Slf4j
public class KafkaProducerService {
private KafkaTemplate<String, Customer> kafkaTemplate;
private CustomerConverter customerConverter;
private CustomerRepository customerRepository;
public KafkaProducerService(KafkaTemplate<String, Customer> kafkaTemplate, CustomerConverter customerConverter, CustomerRepository customerRepository) {
this.kafkaTemplate = kafkaTemplate;
this.customerConverter = customerConverter;
this.customerRepository = customerRepository;
}
#Transactional(transactionManager = "chainedTransactionManager", rollbackFor = Exception.class)
public void sendEvents(String topic, CustomerModel customer) {
LOGGER.info("Sending to Kafka: topic: {}, key: {}, customer: {}", topic, customer.getKey(), customer);
// kafkaTemplate.send(topic, customer.getKey(), customerConverter.convertToAvro(customer));
kafkaTemplate.executeInTransaction(kt -> kt.send(topic, customer.getKey(), customerConverter.convertToAvro(customer)));
customerRepository.saveToDb();
}
}
So i explicitly throw an exception in the saveToDb method and I can see exception throw out. But the consumer application can still see the message.
Code for consumer:
#Slf4j
#Configuration
#EnableKafka
public class KafkaConfiguration {
#Bean
ConcurrentKafkaListenerContainerFactory<String, Customer> kafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory<String, Customer> factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory());
factory.setAfterRollbackProcessor(new DefaultAfterRollbackProcessor<String, Customer>(-1));
// SeekToCurrentErrorHandler errorHandler =
// new SeekToCurrentErrorHandler((record, exception) -> {
// // recover after 3 failures - e.g. send to a dead-letter topic
//// LOGGER.info("***in error handler data, {}", record);
//// LOGGER.info("***in error handler headers, {}", record.headers());
//// LOGGER.info("value: {}", new String(record.headers().headers("springDeserializerExceptionValue").iterator().next().value()));
// }, 3);
//
// factory.setErrorHandler(errorHandler);
return factory;
}
#Bean
public ConsumerFactory<String, Customer> consumerFactory() {
return new DefaultKafkaConsumerFactory<>(consumerConfigs());
}
#Bean
public Map<String, Object> consumerConfigs() {
Map<String, Object> props = new HashMap<>();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
// props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
// props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, KafkaAvroDeserializer.class);
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, ErrorHandlingDeserializer2.class);
props.put(ErrorHandlingDeserializer2.VALUE_DESERIALIZER_CLASS, KafkaAvroDeserializer.class);
props.put("schema.registry.url", "http://127.0.0.1:8081");
props.put("specific.avro.reader", "true");
props.put("isolation.level", "read_committed");
// props.put(ConsumerConfig.GROUP_ID_CONFIG, groupId);
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "false"); // disable auto commit of offsets
props.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, "100"); // disable auto commit of offsets
return props;
}
}
#Component
#Slf4j
public class KafkaConsumerService {
#KafkaListener(id = "demo-consumer-stream-group", topics = "customer.txn")
#Transactional
public void process(ConsumerRecord<String, Customer> record) {
LOGGER.info("Customer key: {} and value: {}", record.key(), record.value());
LOGGER.info("topic: {}, partition: {}, offset: {}", record.topic(), record.partition(), record.offset());
}
}
Did I miss something here?
executeInTransaction will run in a separate transaction. See the javadocs:
/**
* Execute some arbitrary operation(s) on the operations and return the result.
* The operations are invoked within a local transaction and do not participate
* in a global transaction (if present).
* #param callback the callback.
* #param <T> the result type.
* #return the result.
* #since 1.1
*/
<T> T executeInTransaction(OperationsCallback<K, V, T> callback);
Just use send() to participate in the existing transaction.
I know I can find out from which partition record comes in, but I wonder is any way to dynamically get which partitions are assigned for consumers at specific moment? Maybe I need to implement some listener to detect and follow up partitions assignation info?
I am using spring-kafka 1.3.2 with ConcurrentKafkaListenerContainerFactory and #KafkaListener.
Yes, you can do:
#Bean
public ConsumerAwareRebalanceListener rebalanceListener() {
return new ConsumerAwareRebalanceListener() {
#Override
public void onPartitionsAssigned(Consumer<?, ?> consumer, Collection<TopicPartition> partitions) {
// here partitions
}
};
}
And then add it, for example, to ConcurrentKafkaListenerContainerFactory
#Bean
public ConcurrentKafkaListenerContainerFactory<Object, Object> kafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory<Object, Object> factory = new ConcurrentKafkaListenerContainerFactory<>();
ContainerProperties props = factory.getContainerProperties();
props.setConsumerRebalanceListener(rebalanceListener());
return factory;
}
I did it in different way by using KafkaListenerEndpointRegistry
for (MessageListenerContainer messageListenerContainer : kafkaListenerEndpointRegistry.getListenerContainers()) {
List<KafkaMessageListenerContainer> containers = ((ConcurrentMessageListenerContainer) messageListenerContainer).getContainers();
List<TopicPartition> topicPartitions = (List<TopicPartition>) containers.stream().flatMap(kafkaMessageListenerContainer ->
kafkaMessageListenerContainer.getAssignedPartitions().stream()).collect(Collectors.toList());
partitions.addAll(topicPartitions.stream().map(TopicPartition::partition).collect(Collectors.toList()));
}