Spring Kafka difference between ContainerProperties(TopicPartitionOffset... topicPartitions) and ContainerProperties(String... topics) - apache-kafka

I am creating a new KafkaMessageListenerContainer using a ContainerProperties.
Using ContainerProperties(String... topics), the Consumer Group looks fine: "state": "STABLE", "isSimpleConsumerGroup": false
Using ContainerProperties(TopicPartitionOffset... topicPartitions), the Consumer Groups is not automatically created. It is finally created when a message is sent but the Consumer Group doesn't look fine: "state": "EMPTY", "isSimpleConsumerGroup": true
What's the difference between them, did I miss something. I am expecting to have the same result using the two different ContainerProperties constructors.
ContainerProperties containerProps = new ContainerProperties(tpo.toArray(new TopicPartitionOffset[tpo.size()]));
containerProps.setGroupId(name);
// ContainerProperties containerProps = new ContainerProperties("poc-topic1",
// "poc-topic2", "poc-topic3");
// containerProps.setGroupId(name);
containerProps.setMessageListener(new TopicMessageListener(name));
DefaultKafkaConsumerFactory<String, Serializable> factory = new DefaultKafkaConsumerFactory<>(
Utils.get().getConsumerProperties());
container = new KafkaMessageListenerContainer<>(factory, containerProps);
// container.setAutoStartup(true);
// container.setBeanName(name);
// container.checkGroupId();
container.start();

That's not correct. The topic subscription causes a consumer group and their partitions distribution between group members.
When you do explicit partition assignment, not consumer group is involved at all.
See more in Apache Kafka docs: https://docs.confluent.io/platform/current/clients/consumer.html#consumer-groups
And respective JavaDocs:
/**
* Manually assign a list of partitions to this consumer. This interface does not allow for incremental assignment
* and will replace the previous assignment (if there is one).
* <p>
* If the given list of topic partitions is empty, it is treated the same as {#link #unsubscribe()}.
* <p>
* Manual topic assignment through this method does not use the consumer's group management
* functionality. As such, there will be no rebalance operation triggered when group membership or cluster and topic
* metadata change. Note that it is not possible to use both manual partition assignment with {#link #assign(Collection)}
* and group assignment with {#link #subscribe(Collection, ConsumerRebalanceListener)}.
* <p>
* If auto-commit is enabled, an async commit (based on the old assignment) will be triggered before the new
* assignment replaces the old one.
*
* #param partitions The list of partitions to assign this consumer
* #throws IllegalArgumentException If partitions is null or contains null or empty topics
* #throws IllegalStateException If {#code subscribe()} is called previously with topics or pattern
* (without a subsequent call to {#link #unsubscribe()})
*/
#Override
public void assign(Collection<TopicPartition> partitions) {
and:
/**
* Subscribe to the given list of topics to get dynamically assigned partitions.
* <b>Topic subscriptions are not incremental. This list will replace the current
* assignment (if there is one).</b> It is not possible to combine topic subscription with group management
* with manual partition assignment through {#link #assign(Collection)}.
*
* If the given list of topics is empty, it is treated the same as {#link #unsubscribe()}.
*
* <p>
* This is a short-hand for {#link #subscribe(Collection, ConsumerRebalanceListener)}, which
* uses a no-op listener. If you need the ability to seek to particular offsets, you should prefer
* {#link #subscribe(Collection, ConsumerRebalanceListener)}, since group rebalances will cause partition offsets
* to be reset. You should also provide your own listener if you are doing your own offset
* management since the listener gives you an opportunity to commit offsets before a rebalance finishes.
*
* #param topics The list of topics to subscribe to
* #throws IllegalArgumentException If topics is null or contains null or empty elements
* #throws IllegalStateException If {#code subscribe()} is called previously with pattern, or assign is called
* previously (without a subsequent call to {#link #unsubscribe()}), or if not
* configured at-least one partition assignment strategy
*/
#Override
public void subscribe(Collection<String> topics) {

Related

send whole batch to dlt without retrying

i'm using spring kafka and i have a kafka consumer written in java spring boot. My consumer consume batch wise and relevant configuration beans are given below.
#Bean
public ConsumerFactory<String, Object> consumerFactory() {
Map<String, Object> config = new HashMap<>();
// default configs like bootstrap servers, key and value deserializers are here
config.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, "5");
return new DefaultKafkaConsumerFactory<>(config);
}
#Bean
public ConcurrentKafkaListenerContainerFactory<String, Object> kafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory<String, Object> factory =
new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory());
factory.getContainerProperties().setCommitLogLevel(LogIfLevelEnabled.Level.DEBUG);
factory.setBatchListener(true);
return factory;
}
I consume messages and send those messages to an API endpoint. If the api is not available or if the rest template throw an error i want to send the whole batch to a DLT without retrying.
What i want to do is send the whole batch to the DLT without retrying. If we throw BatchListenerFailedException that specific index number owning message from the batch will send to a DLT. In BatchListenerFailedException we can pass only one integer value as index value and not a list. But what i want is to send the whole batch as it is to a DLT topic without retrying. Is there a way to achieve that?
my spring Kafka version is 2.8.6
Edit
my default error handler is like below
#Bean
public CommonErrorHandler commonErrorHandler() {
ExponentialBackOffWithMaxRetries exponentialBackOffWithMaxRetries = new ExponentialBackOffWithMaxRetries(5);
exponentialBackOffWithMaxRetries.setInitialInterval(my val);
exponentialBackOffWithMaxRetries.setMultiplier(my val);
exponentialBackOffWithMaxRetries.setMaxInterval(my val);
DefaultErrorHandler errorHandler = new DefaultErrorHandler(
new DeadLetterPublishingRecoverer(kafkaTemplate(),
(record, exception) -> new TopicPartition(record.topic() + "-dlt", record.partition())),
exponentialBackOffWithMaxRetries);
errorHandler.addNotRetryableExceptions(ParseException.class);
errorHandler.addNotRetryableExceptions(EventHubNonRetryableException.class);
return errorHandler;
}
In my case in used ExponentialBackOffWithMaxRetries instead of FixedBackOff. In my case i have 3 scenarios.
1 - Retry messages and send it to DLT (Throwing any other exception than BatchListenerFailedException)
2 - Send couple of messages from the batch to DLT without retrying (Using BatchListenerFailedException for this)
3 - Send whole batch to the DLT without retrying.
3rd one is the place where i'm struggling. If i send some other exception then it will retry couple of times. (Even if i used FixedBackOff instead of ExponentialBackOffWithMaxRetries )
Throw something else other than BatchListenerFailedException; use a DefaultErrorHandler with a DeadLetterPublishingRecoverer with no retries (new FixedBackOff(0L, 0L)).
EDIT
Starting with versions 3.0.0, 2.9.3, 2.8.11, you can configure not retryable exceptions for batch errors.
https://github.com/spring-projects/spring-kafka/issues/2459
See
/**
* Add exception types to the default list. By default, the following exceptions will
* not be retried:
* <ul>
* <li>{#link DeserializationException}</li>
* <li>{#link MessageConversionException}</li>
* <li>{#link ConversionException}</li>
* <li>{#link MethodArgumentResolutionException}</li>
* <li>{#link NoSuchMethodException}</li>
* <li>{#link ClassCastException}</li>
* </ul>
* All others will be retried, unless {#link #defaultFalse()} has been called.
* #param exceptionTypes the exception types.
* #see #removeClassification(Class)
* #see #setClassifications(Map, boolean)
*/
#SafeVarargs
#SuppressWarnings("varargs")
public final void addNotRetryableExceptions(Class<? extends Exception>... exceptionTypes) {
add(false, exceptionTypes);
notRetryable(Arrays.stream(exceptionTypes));
}
Note that 2.8.x is now out of OSS support. https://spring.io/projects/spring-kafka#support

Spring Cloud Stream Kafka Producer force producer.flush

I am creating a kafka streams/in-out kind of application.
Sample code looks like the following
private MessageChannel output;
public void process(List<String> input) {
--somelogic
output.send()
}
Based on my understanding, kafka buffers the messages before sending them out. Now in case the app crashes / container crashes, there is a possibility of losing the buffered messages.
How can we ensure that messages are actually sent out
(something like KafkaTemplate.flush() here ?)
EDIT
Based on the suggestion by Gary Russell, we should set the FLUSH header on the last message.
Follow up question -
Given the last send method call will become a blocking call because of the blocking nature of the kafkaProducer.flush() method, if there are exceptions thrown during the sending process of the kafka message (e.g. io exception/ auth exception) will they be raised in the same method context ?
e.g. in the following code will the kafka sender exception be caught in the catch block ?
public void process(List<String> input) {
--somelogic
try {
Message<?> message = //
message.setHeader(KafkaIntegrationHeaders.FLUSH, true);
output.send()
}
catch(Exception e) {
e.printstacktrace()
}
}
The underlying KafkaProducerMessageHandler has a property:
/**
* Specify a SpEL expression that evaluates to a {#link Boolean} to determine whether
* the producer should be flushed after the send. Defaults to looking for a
* {#link Boolean} value in a {#link KafkaIntegrationHeaders#FLUSH} header; false if
* absent.
* #param flushExpression the {#link Expression}.
*/
public void setFlushExpression(Expression flushExpression) {
Currently, the binder doesn't support customizing this property.
However, if you are sending Message<?>s, you can set the KafkaIntegrationHeaders.FLUSH header to true on the last message in the batch.

How to assign partitions before seek on ConsumerSeekCallback?

I get the following exception
java.lang.IllegalStateException: No current assignment for partition
on
(ConsumerSeekCallback)callback.seek(topic, partition, offset);
From the kafka documentation -
void seek​(java.lang.String topic, int partition, long offset)
Perform a seek operation. When called from ConsumerSeekAware.onPartitionsAssigned(Map, ConsumerSeekCallback) or from ConsumerSeekAware.onIdleContainer(Map, ConsumerSeekCallback) perform the seek immediately on the consumer. When called from elsewhere, queue the seek operation to the consumer. The queued seek will occur after any pending offset commits. The consumer must be currently assigned the specified partition.
What should i do if the partition is not assigned?
You can capture which topics are assigned to you in ConsumerSeekAware - only perform the seek if you actually have received the topic.
If you extend AbstractConsumerSeekAware, you can call this
/**
* Return the callback for the specified topic/partition.
* #param topicPartition the topic/partition.
* #return the callback (or null if there is no assignment).
*/
#Nullable
protected ConsumerSeekCallback getSeekCallbackFor(TopicPartition topicPartition) {
return this.callbacks.get(topicPartition);
}

Which kafka property decides Poll frequency for KafkaConsumer?

I am trying to understand kafka in some details with respect to kafka streams (kafka stream client to kafka).
I understand that KafkConsumer (java client) would get data from kafka, however I am not able to understand at which frequency does client poll kakfa topic to fetch the data?
The frequency of the poll is defined by your code because you're responsible to call poll.
A very naive example of user code using KafkaConsumer is like the following
public class KafkaConsumerExample {
...
static void runConsumer() throws InterruptedException {
final Consumer<Long, String> consumer = createConsumer();
final int giveUp = 100; int noRecordsCount = 0;
while (true) {
final ConsumerRecords<Long, String> consumerRecords =
consumer.poll(1000);
if (consumerRecords.count()==0) {
noRecordsCount++;
if (noRecordsCount > giveUp) break;
else continue;
}
consumerRecords.forEach(record -> {
System.out.printf("Consumer Record:(%d, %s, %d, %d)\n",
record.key(), record.value(),
record.partition(), record.offset());
});
consumer.commitAsync();
}
consumer.close();
System.out.println("DONE");
}
}
In this case the frequency is defined by the duration of processing the messages in consumerRecords.forEach.
However, keep in mind that if you don't call poll "fast enough" your consumer will be considered dead by the broker coordinator and a rebalance will be triggered.
This "fast enough" is determined by the property max.poll.interval.ms in kafka >= 0.10.1.0. See this answer for more details.
max.poll.interval.ms default value is five minutes, so if your consumerRecords.forEach takes longer than that your consumer will be considered dead.
If you don't want to use the raw KafkaConsumer directly you could use alpakka kafka, a library for consume from and produce to kafka topics in a safe and backpressured way (is based on akka streams).
With this library, the frequency of poll is determined by configuration akka.kafka.consumer.poll-interval.
We say is safe because it will continue polling to avoid the consumer is considered dead even when your processing can't keep up the rate. It's able to do this because KafkaConsumer allows pausing the consumer
/**
* Suspend fetching from the requested partitions. Future calls to {#link #poll(Duration)} will not return
* any records from these partitions until they have been resumed using {#link #resume(Collection)}.
* Note that this method does not affect partition subscription. In particular, it does not cause a group
* rebalance when automatic assignment is used.
* #param partitions The partitions which should be paused
* #throws IllegalStateException if any of the provided partitions are not currently assigned to this consumer
*/
#Override
public void pause(Collection<TopicPartition> partitions) { ... }
To fully understand this you should read about akka-streams and backpressure.

Spring Kafka Consumer/Listener Group

what is the difference in specifying group at the consumer
spring.kafka.consumer.group-id
vs specifying at the #KafkaListener?
#KafkaListener(topic="test", group = "test-grp")
See the javadocs for the group property; it has nothing to do with the kafka group.id...
/**
* If provided, the listener container for this listener will be added to a bean
* with this value as its name, of type {#code Collection<MessageListenerContainer>}.
* This allows, for example, iteration over the collection to start/stop a subset
* of containers.
* #return the bean name for the group.
*/
This has been renamed containerGroup in 1.3/2.0.
Those release versions also provide...
/**
* Override the {#code group.id} property for the consumer factory with this value
* for this listener only.
* #return the group id.
* #since 1.3
*/
String groupId() default "";
/**
* When {#link #groupId() groupId} is not provided, use the {#link #id() id} (if
* provided) as the {#code group.id} property for the consumer. Set to false, to use
* the {#code group.id} from the consumer factory.
* #return false to disable.
* #since 1.3
*/
boolean idIsGroup() default true;
Previously, you needed a container factory/consumer factory for each listener; these allow you to use one factory instance and override the group.id.