Kafka producers push events into partition in wrong order - apache-kafka

What is the recommend way to avoid 'race condition' while pushing multiple events with the same key into topic.
Producer logs:
13:15:25.503
sending record: ad4709f78d71d887297f7b82552a7963, step: INSERT, topic: reporting-topic, kafka key: x-11111111
13:15:25.621
sending record: ad4709f78d71d887297f7b82552a7963, step: UPDATE, topic: reporting-topic, kafka key: x-11111111
Consumer logs:
13:15:25.646
record: ad4709f78d71d887297f7b82552a7963 was not found, step: UPDATE
13:15:25.790
record: ad4709f78d71d887297f7b82552a7963 was not found after 5 retries, step: UPDATE
13:15:25.794
record: ad4709f78d71d887297f7b82552a7963, step: INSERT
I've noticed Kafka records order broken inside partition in some cases:(UPDATE record event before INSERT record event)
Please advice how to 'fix' partition offset issue: do I need to set linger.ms producer config or there is more elegant way?
Producer config:
#Bean
public Map<String, Object> producerProperties() {
Map<String, Object> map = new HashMap<>();
map.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
acks.ifPresent(acksConfig -> map.put(ProducerConfig.ACKS_CONFIG, acksConfig));
map.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
map.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, ByteArraySerializer.class);
return Collections.unmodifiableMap(map);
}

Related

How to get partitionId and TopicName in KafkaStream application

How do we get topic name and partition id from KafkaStream. For any other Kafka consumer we can get topic name and partitionId like following:
ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records) {System.out.printf("consumed: key = %s, value = %s, partition id= %s, offset = %s%n",record.key(), record.value(), record.partition(), record.offset());}
Not sure how to get the record reference in KafkaStreams.
You can get meta data of input record via the ProcessorContext that is exposed in the Processor API. You can embed the Processor API in the DSL via transform() and similar methods.
Check out the docs for details: https://docs.confluent.io/current/streams/developer-guide/processor-api.html#accessing-processor-context

Kafka Transactional read committed Consumer

I have transactional and normal Producer in application which are writting to topic kafka-topic as below.
Configuration for transactional Kafka Producer
#Bean
public Map<String, Object> producerConfigs() {
Map<String, Object> props = new HashMap<>();
// list of host:port pairs used for establishing the initial connections to the Kakfa cluster
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
props.put(ProducerConfig.RETRIES_CONFIG, 5);
/*The amount of time to wait before attempting to retry a failed request to a given topic partition.
* This avoids repeatedly sending requests in a tight loop under some failure scenarios.*/
props.put(ProducerConfig.RETRY_BACKOFF_MS_CONFIG, 3);
/*"The configuration controls the maximum amount of time the client will wait "
"for the response of a request. If the response is not received before the timeout "
"elapses the client will resend the request if necessary or fail the request if "
"retries are exhausted.";.*/
props.put(ProducerConfig.REQUEST_TIMEOUT_MS_CONFIG, 1);
/*To avoid duplicate msg*/
props.put(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, true);
/*Will wait for ack from broker n all replicas*/
props.put(ProducerConfig.ACKS_CONFIG, "all");
/*Kafka Transactional Properties */
props.put(ProducerConfig.CLIENT_ID_CONFIG, "transactional-producer");
props.put(ProducerConfig.TRANSACTIONAL_ID_CONFIG, "test-transactional-id"); // set transaction id
return props;
}
#Bean
public KafkaProducer<String, String> kafkaProducer() {
return new KafkaProducer<>(producerConfigs());
}
Normal Producer config are same only ProducerConfig.CLIENT_ID_CONFIG and ProducerConfig.TRANSACTIONAL_ID_CONFIG are not added.
Consumer config is as below
#Bean
public Map<String, Object> consumerConfigs() {
Map<String, Object> props = new HashMap<>();
//list of host:port pairs used for establishing the initial connections to the Kafka cluster
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
//allows a pool of processes to divide the work of consuming and processing records
props.put(ConsumerConfig.GROUP_ID_CONFIG, "kafka_group");
//automatically reset the offset to the earliest offset
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
//Auto commit is set false.Will do manual commit
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false);
/*Kafka Transactional Property ->Controls how to read messages written transactionally
* read_committed - poll transactional messages which have been committed only
* read_uncommitted - will return all messages, even transactional messages
* default is read_uncommitted
* */
props.put(ConsumerConfig.ISOLATION_LEVEL_CONFIG, "read_committed");
return props;
}
#Bean
public ConsumerFactory<String, String> consumerFactory() {
return new DefaultKafkaConsumerFactory<>(consumerConfigs());
}
As I am setting isolation.level as read_committed so It should consumer only transactional messages from subscribed topic.
But is it consuming transactional and non-transactional messages from topic.
Do I am missing any configuration so that consumer will only consume transactional messages from subscribed topic.
Thanks in advance :-)
It doesn't work that way. isolation.level only pertains to records committed by transactional producers. All consumers see records published by non-transactional producers.
You need to use two different topics to get the behavior you desire,.

Kafka consumer is reading last committed offset on re-start (Java)

I have a kakfa consumer for which enable.auto.commit is set to false. Whenever I re-start my consumer application, it always reads the last committed offset again and then the next offsets.
For ex. Last committed offset is 50. When I restart consumer, it again reads offset 50 first and then the next offsets.
I am performing commitsync as shown below.
Map<TopicPartition, OffsetAndMetadata> offsets = new HashMap<>();
offsets.put(new TopicPartition("sometopic", partition), new OffsetAndMetadata(offset));
kafkaconsumer.commitSync(offsets);
I tried setting auto.offset.reset to earliest and latest but it is not changing the behavior.
Am I missing something here in consumer configuration ?
config.put(ConsumerConfig.CLIENT_ID_CONFIG, "CLIENT_ID");
config.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "127.0.0.1:9092");
config.put(ConsumerConfig.GROUP_ID_CONFIG, "GROUP_ID");
config.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "false");
config.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,StringDeserializer.class.getName());
config.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,CustomDeserializer.class.getName());
config.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "latest");
If you want to use commitSync(offset) you have to be careful and read its Javadoc:
The committed offset should be the next message your application will consume, i.e. lastProcessedMessageOffset + 1.
If you don't add + 1 to the offset, it is expected that on next restart, the consumer will consume again the last message. As mentioned in the other answer, if you use commitSync() without any argument, you don't have to worry about that
It looks like you're trying to commit using new OffsetAndMetadta(offset). That's not the typical usage.
Here's an example from the documentation, under Manual Offset Control:
List<ConsumerRecord<String, String>> buffer = new ArrayList<>();
while (true) {
ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records) {
buffer.add(record);
}
if (buffer.size() >= minBatchSize) {
insertIntoDb(buffer);
consumer.commitSync();
buffer.clear();
}
}
https://kafka.apache.org/21/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html
Notice how the consumer.commitSync() call is performed without any parameters. It simply consumes, and it will commit to whatever was consumed up to that point.

java KafkaConsumer never get results

I'm new to kafka, I have the following sample code :
KafkaConsumer<String,String> kc = new KafkaConsumer<String, String>(props);
while(true) {
List<String> topicNames = Arrays.asList(topics.split(","));
if (!kc.assignment().isEmpty()) {
kc.unsubscribe();
}
kc.subscribe(topicNames);
ConsumerRecords<String, String> recv = kc.poll(1000L);
if (!recv.isEmpty()) {
System.out.println("NOT EMPTY");
}
}
The recv is always empty but if I try to increment the pool timeout the records are returned, also if I cut off the unsubscribe part.
I've taken this piece of code from an integration proprietary software and I cannot modify it.
So my question is: Is this only a timing problem or there is more?
There is a lot that happens when a consumer (re)subscribes to a topic.
Very roughly and as far as I remember the consumer will:
request cluster information
request consumer group metadata
make a JOIN_GROUP request
be assigned certain partitions
The underlying mechanisms are even more complicated if there are more consumers within the same group. That's because the partitions should be reassigned between all the consumers within the group.
That is why:
1000 millis might not be enough for all this and you didn't poll anything in time
you polled something when you increased the timeout because Kafka managed to perform all of these bootstrapping operations
you polled something when you removed the unsubscription to the topics because most likely your consumer was already subscribed
So there is a timing issue. And I think that there is something more - un/subscribing to a topic within an infinite loop makes no sense to me (see the other answer).
You should subscribe to your topics only once at the beginning. Like this:
final KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Arrays.asList("foo", "bar"));
while (true) {
final ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records)
System.out.printf("offset = %d, key = %s, value = %s%n", record.offset(), record.key(), record.value());
}

Consume only specific partition message

Here is my kafka message producer:
ProducerRecord producerRecord = new ProducerRecord(topic, "k1", message);
producer.send(producerRecord);
here is my consumer
TopicPartition partition0 = new TopicPartition(topic, 0);
consumer.assign(Arrays.asList(partition0));
final int minBatchSize = 200;
List<ConsumerRecord<String, byte[]>> buffer = new ArrayList<>();
while (true) {
ConsumerRecords<String, byte[]> records = consumer.poll(100);
for (ConsumerRecord<String, byte[]> record : records) {
buffer.add(record);
System.out.println(record.key() + "KEY: " + record.value());
How is it possible to consume only topic message having k1 as partition key
The only way I see to implement such behavior is to have the number of partitions == number of possible keys and have a custom partitioner to maintain key uniqueness for a partition (default hash partitioner would work I think). But this solution is very far from optimal and I can't recommend it. Besides that you can't use any built in mechanism to achieve similar behavior - you'll have to filter messages on client side
One proposal is to remember the partition and offset of your specific message,
and using assign and seek, poll in consumer side.(also set consumer max.poll.records=1, which fetch one message in one time).
assign, assign specific partition to consumer;
seek, seek to specific offset, then next poll will get your expected message K1.
Note:It works like "random" seek, but will reduce message consumption performance.
0.10 new consumer and new config max.poll.records are required.