How to set max.poll.records in Kafka-Connect API - apache-kafka

I am using confluent-3.0.1 platform and building a Kafka-Elasticsearch connector. For this I am extending SinkConnector and SinkTask (Kafka-connect APIs) to get data from Kafka.
As part of this code i am extending taskConfigs method of SinkConnector to return "max.poll.records" to fetch only 100 records at a time. But its not working and I am getting all records at same time and I am failing to commit offsets within the stipulated time. Please can any one help me to configure "max.poll.records"
public List<Map<String, String>> taskConfigs(int maxTasks) {
ArrayList<Map<String, String>> configs = new ArrayList<Map<String, String>>();
for (int i = 0; i < maxTasks; i++) {
Map<String, String> config = new HashMap<String, String>();
config.put(ConfigurationConstants.CLUSTER_NAME, clusterName);
config.put(ConfigurationConstants.HOSTS, hosts);
config.put(ConfigurationConstants.BULK_SIZE, bulkSize);
config.put(ConfigurationConstants.IDS, elasticSearchIds);
config.put(ConfigurationConstants.TOPICS_SATELLITE_DATA, topics);
config.put(ConfigurationConstants.PUBLISH_TOPIC, topicTopublish);
config.put(ConfigurationConstants.TYPES, elasticSearchTypes);
config.put("max.poll.records", "100");
configs.add(config);
}
return configs;
}

You can't override most Kafka consumer configs like max.poll.records in the connector configuration. You can do so in the Connect worker configuration though, with a consumer. prefix.

It was solved. I added below configuration in connect-avro-standalone.properties
group.id=mygroup
consumer.max.poll.records=1000
and ran below command for running my connector.
sh ./bin/connect-standalone ./etc/schema-registry/connect-avro-standalone.properties ./etc/kafka-connect-elasticsearch/connect-elasticsearch-sink.properties

Related

error handling when consume as a batch in kafka

i have a kafka consumer written in java spring boot (spirng kafka). My consumer is like below.
#RetryableTopic(
attempts = "4",
backoff = #Backoff(delay = 1000, multiplier = 2.0),
autoCreateTopics = "false",
topicSuffixingStrategy = TopicSuffixingStrategy.SUFFIX_WITH_INDEX_VALUE,
include = {ResourceAccessException.class, MyCustomRetryableException.class})
#KafkaListener(topics = "myTopic", groupId = "myGroup", autoStartup = "true", concurrency = "3")
public void consume(#Header(KafkaHeaders.RECEIVED_TOPIC) String topic,
#Header("custom_header_1") String customHeader1,
#Header("custom_header_2") String customHeader2,
#Header("custom_header_3") String customHeader3,
#Header(required = false, name = KafkaHeaders.RECEIVED_MESSAGE_KEY) String key,
#Payload(required = false) String message) {
log.info("-------------------------");
log.info(key);
log.info(message);
log.info("-------------------------");
}
I have used #RetryableTopic annotation to handle errors. I have written a custom exception class and whatever method that throw my custom exception class (MyCustomRetryableException.class), it will retry according to the backoff with number of attempts defined in the retryable annotation. So in here i dont have to do anything. Kafka will simple publish failing messages to the correct dlt topic. All i have to do is create dlt related topic since i have used autoCreateTopics = "false".
Now i'm trying to consume messages in batch wise. I changed my kafka config like below in order to consume in batch wise.
#Bean
public ConsumerFactory<String, Object> consumerFactory() {
Map<String, Object> config = new HashMap<>();
// default configs like bootstrap servers, key and value deserializers are here
config.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, "5");
return new DefaultKafkaConsumerFactory<>(config);
}
#Bean
public ConcurrentKafkaListenerContainerFactory<String, Object> kafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory<String, Object> factory =
new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory());
factory.getContainerProperties().setCommitLogLevel(LogIfLevelEnabled.Level.DEBUG);
factory.setBatchListener(true);
return factory;
}
Now that i have added batch listeners, #RetryableTopic is not supported with it. So how can i achieve the publishing failed messages to DLT task which was previously handled by #RetryableTopic ?
If anyone can answer with an example it would be great. Thank you in advance.
See the documentation.
Use a DefaultErrorHandler with a DeadLetterPublishingRecoverer.
Non blocking retries are not supported; the retries will use the configured BackOff.
Throw a BatchListenerFailedException to indicate which record in the batch failed and just that one will be sent to the DLT.
With any other exception, the whole batch will be retried (and sent to the DLT if retries are exhausted).
https://docs.spring.io/spring-kafka/docs/current/reference/html/#retrying-batch-eh

How to reconfigure an existing confluent kafka producer instance in C#?

I have a kafka producer initialized as follows:
var config = new Dictionary<string, object> { { "bootstrap.servers", BOOTSTRAP_SERVERS } };
Producer _producer = new Producer(config);
After the initialization, when I am about to produce a message on a particular topic I want to add compression to the producer (Only for the topic). I need to add the compression config { "compression.codec", "gzip" }.
How can I do this?

Creating kafka stream API for JSON's

i am trying to write a kafka stream code for converting JSON array to JSON elements...since i am new to kafka stream can any one help me out writing the code.. like what should be there in kstream and ktable..
and my stream of input ll be in the following format
[
{"timestamp":"2017-10-24T12:44:09.359126933+05:30","data":0,"unit":""},
{"timestamp":"2017-10-24T12:44:09.359175426+05:30","data":1,"unit":""}
]
[
{"timestamp":"2017-10-24T12:44:09.359126933+05:30","data":2,"unit":""},
{"timestamp":"2017-10-24T12:44:09.359175426+05:30","data":3,"unit":""}
]
and my output must be in the form
{"timestamp":"2017-10-24T12:44:09.359126933+05:30","data":0,"unit":""}
{"timestamp":"2017-10-24T12:44:09.359175426+05:30","data":1,"unit":""}
{"timestamp":"2017-10-24T12:44:09.359126933+05:30","data":2,"unit":""}
{"timestamp":"2017-10-24T12:44:09.359175426+05:30","data":3,"unit":""}
can anyone help me out in writing the code??
If you want to use Kafka Streams, you can use a flatMap(). Something like
// using new 1.0 API
StreamsBuilder builder = new StreamsBuilder();
builer.stream("topic").flatMap(...).to("output-topic");
Check out the examples and docs for more details:
https://docs.confluent.io/current/streams/developer-guide/index.html
https://github.com/confluentinc/kafka-streams-examples
in Python...
from kafka import KafkaConsumer
consumer = KafkaConsumer('topicName')
for message in consumer:
print(message)
specify bootstrap_servers parameter in KafkaConsumer.
For Java look cloudkarafka, really good:
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Arrays.asList(topic));
while (true) {
ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records)
System.out.printf("msg = %s\n", record.value());
}
}

Kafka consumer does not start from latest message

I want to have a Kafka Consumer which starts from the latest message in a topic.
Here is the java code:
private static Properties properties = new Properties();
private static KafkaConsumer<String, String> consumer;
static
{
properties.setProperty("bootstrap.servers","localhost");
properties.setProperty("enable.auto.commit", "true");
properties.setProperty("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
properties.setProperty("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
properties.setProperty("group.id", "test");
properties.setProperty("auto.offset.reset", "latest");
consumer = new KafkaConsumer<>(properties);
consumer.subscribe(Collections.singletonList("mytopic"));
}
#Override
public StreamHandler call() throws Exception
{
while (true)
{
ConsumerRecords<String, String> consumerRecords = consumer.poll(200);
Iterable<ConsumerRecord<String, String>> records = consumerRecords.records("mytopic");
for(ConsumerRecord<String, String> rec : records)
{
System.out.println(rec.value());
}
}
}
Although the value for auto.offset.reset is latest, but the consumer starts form messages which belong to 2 days ago and then it catches up with the latest messages.
What am I missing?
Have you run this same code before with the same group.id? The auto.offset.reset parameter is only used if there is not an existing offset already stored for your consumer. So if you've run the example previously, say two days ago, and then you run it again, it will start from the last consumed position.
Use seekToEnd() if you would like to manually go to the end of the topic.
See https://stackoverflow.com/a/32392174/1392894 for a slightly more thorough discussion of this.
If you want to manually control the position of your offsets you need to set enable.auto.commit = false.
If you want to position all offsets to the end of each partition then call seekToEnd()
https://kafka.apache.org/0102/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html#seekToEnd(java.util.Collection)

Can single Kafka producer produce messages to multiple topics and how?

I am just exploring Kafka, currently i am using One producer and One topic to produce messages and it is consumed by one Consumer. very simple.
I was reading the Kafka page, the new Producer API is thread-safe and sharing single instance will improve the performance.
Does it mean i can use single Producer to publish messages to multiple topics?
Never tried it myself, but I guess you can. Since the code for producer and sending the record is (from here https://kafka.apache.org/090/javadoc/index.html?org/apache/kafka/clients/producer/KafkaProducer.html):
Producer<String, String> producer = new KafkaProducer<>(props);
for(int i = 0; i < 100; i++)
producer.send(new ProducerRecord<String, String>("my-topic", Integer.toString(i), Integer.toString(i)));
So, I guess, if you just write different topics in the ProducerRecord, than it should be possible.
Also, here http://kafka.apache.org/081/documentation.html#producerapi it explicitly says that you can use a method send(List<KeyedMessage<K,V>> messages) to write into multiple topics.
If I understand you correctly, you are more looking using the same producer instance to send the same/multiple messages on multiple topics.
Not sure about java, but here you can do in C#(.NET) using the Kafka .NET Client DependentProducerBuilder
using (var producer = new ProducerBuilder<string, string>(config).Build())
using (var producer2 = new DependentProducerBuilder<Null, int>(producer.Handle).Build())
{
producer.ProduceAsync("first-topic", new Message<string, string> { Key = "my-key-value", Value = "my-value" });
producer2.ProduceAsync("second-topic", new Message<Null, int> { Value = 42 });
producer2.ProduceAsync("first-topic", new Message<Null, int> { Value = 107 });
producer.Flush(TimeSpan.FromSeconds(10));
}