There is a useful metric for monitoring Kafka Consumer lag in spring-kafka called kafka_consumer_records_lag_max_records. But this metric is not working for transactional consumers. Is there specific configuration to enable lag metric for transactional consumers?
I have configured my consumer group to work with isolation level read_committed and the metric contains kafka_consumer_records_lag_max_records{client_id="listener-1",} -Inf
What do you mean by "doesn't work"? I just tested it and it works fine...
#SpringBootApplication
public class So56540759Application {
public static void main(String[] args) throws IOException {
ConfigurableApplicationContext context = SpringApplication.run(So56540759Application.class, args);
System.in.read();
context.close();
}
private MetricName lagNow;
private MetricName lagMax;
#Autowired
private MeterRegistry meters;
#KafkaListener(id = "so56540759", topics = "so56540759", clientIdPrefix = "so56540759",
properties = "max.poll.records=1")
public void listen(String in, Consumer<?, ?> consumer) {
Map<MetricName, ? extends Metric> metrics = consumer.metrics();
Metric currentLag = metrics.get(this.lagNow);
Metric maxLag = metrics.get(this.lagMax);
System.out.println(in
+ " lag " + currentLag.metricName().name() + ":" + currentLag.metricValue()
+ " max " + maxLag.metricName().name() + ":" + maxLag.metricValue());
Gauge gauge = meters.get("kafka.consumer.records.lag.max").gauge();
System.out.println("lag-max in Micrometer: " + gauge.value());
}
#Bean
public NewTopic topic() {
return new NewTopic("so56540759", 1, (short) 1);
}
#Bean
public ApplicationRunner runner(KafkaTemplate<String, String> template) {
Set<String> tags = new HashSet<>();
FetcherMetricsRegistry registry = new FetcherMetricsRegistry(tags, "consumer");
MetricNameTemplate temp = registry.recordsLagMax;
this.lagMax = new MetricName(temp.name(), temp.group(), temp.description(),
Collections.singletonMap("client-id", "so56540759-0"));
temp = registry.partitionRecordsLag;
Map<String, String> tagsMap = new LinkedHashMap<>();
tagsMap.put("client-id", "so56540759-0");
tagsMap.put("topic", "so56540759");
tagsMap.put("partition", "0");
this.lagNow = new MetricName(temp.name(), temp.group(), temp.description(), tagsMap);
return args -> IntStream.range(0, 10).forEach(i -> template.send("so56540759", "foo" + i));
}
}
2019-06-11 12:13:45.803 INFO 32187 --- [ main] o.a.k.clients.consumer.ConsumerConfig : ConsumerConfig values:
auto.commit.interval.ms = 5000
auto.offset.reset = earliest
bootstrap.servers = [localhost:9092]
check.crcs = true
client.id = so56540759-0
connections.max.idle.ms = 540000
default.api.timeout.ms = 60000
enable.auto.commit = false
exclude.internal.topics = true
fetch.max.bytes = 52428800
fetch.max.wait.ms = 500
fetch.min.bytes = 1
group.id = so56540759
heartbeat.interval.ms = 3000
interceptor.classes = []
internal.leave.group.on.close = true
isolation.level = read_committed
...
transaction.timeout.ms = 60000
...
2019-06-11 12:13:45.840 INFO 32187 --- [o56540759-0-C-1] o.s.k.l.KafkaMessageListenerContainer : partitions assigned: [so56540759-0]
foo0 lag records-lag:9.0 max records-lag-max:9.0
lag-max in Micrometer: 9.0
foo1 lag records-lag:8.0 max records-lag-max:9.0
lag-max in Micrometer: 9.0
foo2 lag records-lag:7.0 max records-lag-max:9.0
lag-max in Micrometer: 9.0
foo3 lag records-lag:6.0 max records-lag-max:9.0
lag-max in Micrometer: 9.0
foo4 lag records-lag:5.0 max records-lag-max:9.0
lag-max in Micrometer: 9.0
foo5 lag records-lag:4.0 max records-lag-max:9.0
lag-max in Micrometer: 9.0
foo6 lag records-lag:3.0 max records-lag-max:9.0
lag-max in Micrometer: 9.0
foo7 lag records-lag:2.0 max records-lag-max:9.0
lag-max in Micrometer: 9.0
foo8 lag records-lag:1.0 max records-lag-max:9.0
lag-max in Micrometer: 9.0
foo9 lag records-lag:0.0 max records-lag-max:9.0
lag-max in Micrometer: 9.0
EDIT2
I do see it going to -Infinity in the MBean if a transaction times out - i.e. if the listener doesn't exit within 60 seconds in my test.
Related
spring.kafka.consumer.max-poll-records = 2000 //each record of size 5kb takes 100 ms so to process entire batch takes 500 sec i.e 8 min 20 sec
spring.kafka.consumer.properties.max.poll.interval.ms = 900000 //15 min
spring.kafka.consumer.properties.fetch.max.wait.ms = 600000 //10 min
spring.kafka.consumer.properties.max.partition.fetch.bytes = 10485760 //10MB
spring.kafka.consumer.fetch-min-size = 5242880 // fetch.min.bytes - 5mb
spring.kafka.listener.concurrency = 1
With the above configuration the consumer is continously polling the records its respective intervals like sometimes 2 mins, 3 mins eventhough fetch.max.wait.ms has been set it as as 10 min.
How this is happening ? May i know the precedence of polling ? is it based on the max-poll-records, fetch.max.wait.ms ,max.partition.fetch.bytes or fetch-min-size ?
EDIT:
I tried to maximize the below attribute values to fetch maximum records, but still seeing only 200 to 300 records are getting processed.
spring.kafka.consumer.properties.max.poll.interval.ms = 900000 //15 min
spring.kafka.consumer.max-poll-records = 2000 //each record of size 5kb takes 100 ms so to process entire batch, takes 2000*100 ms =200sec i.e 3 min 20 sec which is way less than the max poll interval (10min)
spring.kafka.consumer.properties.fetch.max.wait.ms = 600000 //10 min
spring.kafka.consumer.properties.max.partition.fetch.bytes = 20971520
//20MB
spring.kafka.consumer.fetch-min-size = 104857600 // fetch.min.bytes - 20mb
Am i doing something wrong?
Assuming the records are exactly 5kb, the poll will return when 1k records are received or 10 minutes elapse, whichever happens first.
You will only ever get max.poll.records if they are immediately available.
EDIT
It looks to me like these "smaller" batches are the remnants of the previous fetch; with this code:
#SpringBootApplication
public class So68201599Application {
private static final Logger log = LoggerFactory.getLogger(So68201599Application.class);
public static void main(String[] args) {
SpringApplication.run(So68201599Application.class, args);
}
#KafkaListener(id = "so68201599", topics = "so68201599", autoStartup = "false")
public void listen(ConsumerRecords<?, ?> in) {
log.info("" + in.count() + "\n"
+ in.partitions().stream()
.map(part -> "" + part.partition() + "(" + in.records(part).size() + ")")
.collect(Collectors.toList()));
}
#Bean
public NewTopic topic() {
return TopicBuilder.name("so68201599").partitions(10).replicas(1).build();
}
#Bean
public ApplicationRunner runner(KafkaTemplate<String, String> template, KafkaListenerEndpointRegistry registry) {
String msg = new String(new byte[1024*5]);
return args -> {
List<Future<?>> futures = new ArrayList<>();
IntStream.range(0, 9000).forEach(i -> futures.add(template.send("so68201599", msg)));
futures.forEach(fut -> {
try {
fut.get(10, TimeUnit.SECONDS);
}
catch (InterruptedException e) {
Thread.currentThread().interrupt();
e.printStackTrace();
}
catch (ExecutionException e) {
e.printStackTrace();
}
catch (TimeoutException e) {
e.printStackTrace();
}
});
log.info("Sent");
registry.getListenerContainer("so68201599").start();
};
}
}
and
spring.kafka.consumer.max-poll-records = 2000
spring.kafka.consumer.properties.fetch.max.wait.ms = 10000
spring.kafka.consumer.fetch-min-size = 10240000
spring.kafka.consumer.auto-offset-reset=earliest
spring.kafka.listener.type=batch
I get
2021-07-08 15:04:11.131 INFO 45792 --- [o68201599-0-C-1] com.example.demo.So68201599Application : 2000
[1(201), 0(201), 5(201), 4(201), 3(201), 2(201), 9(201), 8(201), 7(201), 6(191)]
2021-07-08 15:04:11.137 INFO 45792 --- [o68201599-0-C-1] com.example.demo.So68201599Application : 10
[6(10)]
2021-07-08 15:04:21.170 INFO 45792 --- [o68201599-0-C-1] com.example.demo.So68201599Application : 1809
[1(201), 0(201), 5(201), 4(201), 3(201), 2(201), 9(201), 8(201), 7(201)]
2021-07-08 15:04:21.214 INFO 45792 --- [o68201599-0-C-1] com.example.demo.So68201599Application : 2000
[1(201), 0(201), 5(201), 4(201), 3(201), 2(201), 9(201), 8(201), 7(201), 6(191)]
2021-07-08 15:04:21.215 INFO 45792 --- [o68201599-0-C-1] com.example.demo.So68201599Application : 10
[6(10)]
2021-07-08 15:04:31.248 INFO 45792 --- [o68201599-0-C-1] com.example.demo.So68201599Application : 1809
[1(201), 0(201), 5(201), 4(201), 3(201), 2(201), 9(201), 8(201), 7(201)]
2021-07-08 15:04:41.267 INFO 45792 --- [o68201599-0-C-1] com.example.demo.So68201599Application : 1083
[1(27), 0(87), 5(189), 4(93), 3(114), 2(129), 9(108), 8(93), 7(42), 6(201)]
2021-07-08 15:04:51.276 INFO 45792 --- [o68201599-0-C-1] com.example.demo.So68201599Application : 201
[6(201)]
2021-07-08 15:05:01.279 INFO 45792 --- [o68201599-0-C-1] com.example.demo.So68201599Application : 78
[6(78)]
I don't know why the second and the three penultimate fetches timed out, though.
Kafka v2.4 Consumer Configurations:-
kafka.consumer.auto.offset.reset=earliest
kafka.consumer.auto.commit=false
Kafka consumer container config:-
#Bean
public ConcurrentKafkaListenerContainerFactory<String, PayoutDto> kafkaPayoutStatusPoolListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory<String, PayoutDto> factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(kafkaConsumerFactoryForPayoutEvent());
factory.getContainerProperties().setAckMode(AckMode.MANUAL_IMMEDIATE);
factory.setMissingTopicsFatal(false);
return factory;
}
Kafka consumer:-
#KafkaListener(id = "regularPayoutEventConsumer", topics = "${kafka.regular.payout.consumer.queuename}", containerFactory = "kafkaPayoutStatusPoolListenerContainerFactory", groupId = "${kafka.regular.payout.consumer.groupId}")
public void listen(ConsumerRecord<String, PayoutDto> consumerRecord, Acknowledgment ack) {
StopWatch watch = new StopWatch();
watch.start();
String key = null;
Long offset = null;
try {
PayoutDto payoutDto = consumerRecord.value();
key = consumerRecord.key();
offset = consumerRecord.offset();
cpAccountsService.processPayoutEvent(payoutDto);
ack.acknowledge();
} catch (Exception e) {
log.error("Exception occured in RegularPayoutEventConsumer due to following issue {}", e);
} finally {
watch.stop();
log.debug("tolal time taken by consumer for requestID:" + key + " on offset:" + offset + " is:"
+ watch.getTotalTimeMillis());
}
}
Success Scenario:-
consumer failed to acknowledge on an exception which creates a lag, lets say last committed offset is 30 and now lag is 4.
on next auto poll cycle after poll interval, consumer continues to consumes where lag starts from 30 and ends at 33 normally and lag is now 0.
Failed Scenario:-
same as step 1 from success scenario.
now before consumer poll interval, producer pushed new message.
now on new producer event, consumer pulls data and jumps directly to offset record 33 and skipped 30,31,32 and clearing the lag to 0.
App startup logs of kafka:-
2021-04-14 10:38:06.132 INFO 10286 --- [ restartedMain] o.a.k.clients.consumer.KafkaConsumer : [Consumer clientId=consumer-RegularPayoutEventGroupId-3, groupId=RegularPayoutEventGroupId] Subscribed to topic(s): InstantPayoutTransactionsEv
2021-04-14 10:38:06.132 INFO 10286 --- [ restartedMain] o.s.s.c.ThreadPoolTaskScheduler : Initializing ExecutorService
2021-04-14 10:38:06.133 INFO 10286 --- [ restartedMain] o.a.k.clients.consumer.ConsumerConfig : ConsumerConfig values:
allow.auto.create.topics = true
auto.commit.interval.ms = 5000
auto.offset.reset = earliest
bootstrap.servers = [localhost:9092]
check.crcs = true
client.dns.lookup = use_all_dns_ips
client.id = consumer-PayoutEventGroupId-4
client.rack =
connections.max.idle.ms = 540000
default.api.timeout.ms = 60000
enable.auto.commit = false
exclude.internal.topics = true
fetch.max.bytes = 52428800
fetch.max.wait.ms = 500
fetch.min.bytes = 1
group.id = PayoutEventGroupId
group.instance.id = null
heartbeat.interval.ms = 3000
interceptor.classes = []
internal.leave.group.on.close = true
internal.throw.on.fetch.stable.offset.unsupported = false
isolation.level = read_uncommitted
key.deserializer = class org.apache.kafka.common.serialization.StringDeserializer
max.partition.fetch.bytes = 1048576
max.poll.interval.ms = 30000
max.poll.records = 500
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
partition.assignment.strategy = [class org.apache.kafka.clients.consumer.RangeAssignor]
receive.buffer.bytes = 65536
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 30000
retry.backoff.ms = 100
sasl.client.callback.handler.class = null
sasl.jaas.config = null
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 60000
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.login.callback.handler.class = null
sasl.login.class = null
sasl.login.refresh.buffer.seconds = 300
sasl.login.refresh.min.period.seconds = 60
sasl.login.refresh.window.factor = 0.8
sasl.login.refresh.window.jitter = 0.05
sasl.mechanism = GSSAPI
security.protocol = PLAINTEXT
security.providers = null
send.buffer.bytes = 131072
session.timeout.ms = 10000
ssl.cipher.suites = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.3]
ssl.endpoint.identification.algorithm = https
ssl.engine.factory.class = null
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.location = null
ssl.keystore.password = null
ssl.keystore.type = JKS
ssl.protocol = TLSv1.3
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.location = null
ssl.truststore.password = null
ssl.truststore.type = JKS
value.deserializer = class com.cms.cpa.config.KafkaPayoutDeserializer
2021-04-14 10:38:06.137 INFO 10286 --- [ restartedMain] o.a.kafka.common.utils.AppInfoParser : Kafka version: 2.6.0
2021-04-14 10:38:06.137 INFO 10286 --- [ restartedMain] o.a.kafka.common.utils.AppInfoParser : Kafka commitId: 62abe01bee039651
Kafka maintains 2 values for a consumer/partition - the committed offset (where the consumer will start if restarted) and position - which record will be returned on the next poll.
Not acknowledging a record will not cause the position to be repositioned.
It is working as-designed; if you want to re-process a failed record, you need to use acknowledgment.nack() with an optional sleep time, or throw an exception and configure a SeekToCurrentErrorHandler.
In those cases, the container will reposition the partitions so that the failed record is redelivered. With the error handler you can "recover" the failed record after the retries are exhausted. When using nack(), the listener has to keep track of the attempts.
See https://docs.spring.io/spring-kafka/docs/current/reference/html/#committing-offsets
and https://docs.spring.io/spring-kafka/docs/current/reference/html/#annotation-error-handling
My project uses Reactor Kafka 1.2.2.RELEASE to send events serialized with Avro to my Kafka broker. This works well, however sending events seems to be quite slow.
We plugged a custom metric to the lifecycle of the Mono sending the event through KafkaSender.send(), and noticed it took around 100ms to deliver a message.
We did it this way:
send(event, code, getKafkaHeaders()).transform(withMetric("eventName"));
The send method just builds the record and sends it:
private Mono<Void> send(SpecificRecord value, String code, final List<Header> headers) {
final var producerRecord = new ProducerRecord<>("myTopic", null, code, value, headers);
final var record = Mono.just(SenderRecord.create(producerRecord, code));
return Optional.ofNullable(kafkaSender.send(record)).orElseGet(Flux::empty)
.switchMap(this::errorIfAnyException)
.doOnNext(res -> log.info("Successfully sent event on topic {}", res.recordMetadata().topic()))
.then();
}
And the withMetric transformer links a metric to the send mono lifecycle:
private Function<Mono<Void>, Mono<Void>> withMetric(final String methodName) {
return mono -> Mono.justOrEmpty(this.metricProvider)
.map(provider -> provider.buildMethodExecutionTimeMetric(methodName, "kafka"))
.flatMap(metric -> mono.doOnSubscribe(subscription -> metric.start())
.doOnTerminate(metric::end));
}
That is this custom metric that returns an average of 100ms.
We compared it to our Kafka producer metrics, and noticed that those ones returned and average of 40ms to deliver a message (0ms of queueing, and 40ms of request latency).
We have difficulties to understand the delta, an wonder if it could come from the Reactor Kafka method to send events.
Can anybody help please?
UPDATE
Here's a sample of my producer config:
acks = all
batch.size = 16384
buffer.memory = 33554432
client.dns.lookup = default
compression.type = none
connections.max.idle.ms = 540000
delivery.timeout.ms = 120000
enable.idempotence = true
key.serializer = class org.apache.kafka.common.serialization.StringSerializer
linger.ms = 0
max.block.ms = 60000
max.in.flight.requests.per.connection = 5
max.request.size = 1048576
metadata.max.age.ms = 300000
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
partitioner.class = class org.apache.kafka.clients.producer.internals.DefaultPartitioner
receive.buffer.bytes = 32768
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 30000
retries = 2147483647
retry.backoff.ms = 100
sasl.login.refresh.buffer.seconds = 300
sasl.login.refresh.min.period.seconds = 60
sasl.login.refresh.window.factor = 0.8
sasl.login.refresh.window.jitter = 0.05
sasl.mechanism = GSSAPI
security.protocol = PLAINTEXT
send.buffer.bytes = 131072
ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
ssl.endpoint.identification.algorithm = https
ssl.keystore.type = JKS
ssl.protocol = TLS
ssl.trustmanager.algorithm = PKIX
ssl.truststore.type = JKS
transaction.timeout.ms = 60000
value.serializer = class io.confluent.kafka.serializers.KafkaAvroSerializer
Also, the maxInFlight is 256 and the scheduler is single, I didn't configure anything special here.
I am writing Kafka client producer as:
public class BasicProducerExample {
public static void main(String[] args){
Properties props = new Properties();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "127.0.0.1:9092");
props.put(ProducerConfig.ACKS_CONFIG, "all");
props.put(ProducerConfig.RETRIES_CONFIG, 0);
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer");
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer");
//props.put(ProducerConfig.
props.put("batch.size","16384");// maximum size of message
Producer<String, String> producer = new KafkaProducer<String, String>(props);
TestCallback callback = new TestCallback();
Random rnd = new Random();
for (long i = 0; i < 2 ; i++) {
//ProducerRecord<String, String> data = new ProducerRecord<String, String>("dke", "key-" + i, "message-"+i );
//Topci and Message
ProducerRecord<String, String> data = new ProducerRecord<String, String>("dke", ""+i);
producer.send(data, callback);
}
producer.close();
}
private static class TestCallback implements Callback {
#Override
public void onCompletion(RecordMetadata recordMetadata, Exception e) {
if (e != null) {
System.out.println("Error while producing message to topic :" + recordMetadata);
e.printStackTrace();
} else {
String message = String.format("sent message to topic:%s partition:%s offset:%s", recordMetadata.topic(), recordMetadata.partition(), recordMetadata.offset());
System.out.println(message);
}
}
}
}
OUTPUT:
Error while producing message to topic :null
org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 60000 ms.
NOTE:
Broker port: localhost:6667 is working.
In your property for BOOTSTRAP_SERVERS_CONFIG, try changing the port number to 6667.
Thanks.
--
Hiren
I use Apache Kafka on a Hortonworks (HDP 2.X release) installation. The error message encountered means that Kafka producer was not able to push the data to the segment log file. From a command-line console, that would mean 2 things :
You are using incorrect port for the brokers
Your listener config in server.properties are not working
If you encounter the error message while writing via scala api, additionally check connection to kafka cluster using telnet <cluster-host> <broker-port>
NOTE: If you are using scala api to create topic, it takes sometime for the brokers to know about the newly created topic. So, immediately after topic creation, the producers might fail with the error Failed to update metadata after 60000 ms.
I did the following checks in order to resolve this issue:
The first difference once I check via Ambari is that Kafka brokers listen on port 6667 on HDP 2.x (apache kafka uses 9092).
listeners=PLAINTEXT://localhost:6667
Next, use the ip instead of localhost.
I executed netstat -na | grep 6667
tcp 0 0 192.30.1.5:6667 0.0.0.0:* LISTEN
tcp 1 0 192.30.1.5:52242 192.30.1.5:6667 CLOSE_WAIT
tcp 0 0 192.30.1.5:54454 192.30.1.5:6667 TIME_WAIT
So, I modified the producer call to user the IP and not localhost:
./kafka-console-producer.sh --broker-list 192.30.1.5:6667 --topic rdl_test_2
To monitor if you have new records being written, monitor the /kafka-logs folder.
cd /kafka-logs/<topic name>/
ls -lart
-rw-r--r--. 1 kafka hadoop 0 Feb 10 07:24 00000000000000000000.log
-rw-r--r--. 1 kafka hadoop 10485756 Feb 10 07:24 00000000000000000000.timeindex
-rw-r--r--. 1 kafka hadoop 10485760 Feb 10 07:24 00000000000000000000.index
Once, the producer successfully writes, the segment log-file 00000000000000000000.log will grow in size.
See the size below:
-rw-r--r--. 1 kafka hadoop 10485760 Feb 10 07:24 00000000000000000000.index
-rw-r--r--. 1 kafka hadoop **45** Feb 10 09:16 00000000000000000000.log
-rw-r--r--. 1 kafka hadoop 10485756 Feb 10 07:24 00000000000000000000.timeindex
At this point, you can run the consumer-console.sh:
./kafka-console-consumer.sh --bootstrap-server 192.30.1.5:6667 --topic rdl_test_2 --from-beginning
response is hello world
After this step, if you want to produce messages via the Scala API's , then change the listeners value(from localhost to a public IP) and restart Kafka brokers via Ambari:
listeners=PLAINTEXT://192.30.1.5:6667
A Sample producer will be as follows:
package com.scalakafka.sample
import java.util.Properties
import java.util.concurrent.TimeUnit
import org.apache.kafka.clients.producer.{ProducerRecord, KafkaProducer}
import org.apache.kafka.common.serialization.{StringSerializer, StringDeserializer}
class SampleKafkaProducer {
case class KafkaProducerConfigs(brokerList: String = "192.30.1.5:6667") {
val properties = new Properties()
val batchsize :java.lang.Integer = 1
properties.put("bootstrap.servers", brokerList)
properties.put("key.serializer", classOf[StringSerializer])
properties.put("value.serializer", classOf[StringSerializer])
// properties.put("serializer.class", classOf[StringDeserializer])
properties.put("batch.size", batchsize)
// properties.put("linger.ms", 1)
// properties.put("buffer.memory", 33554432)
}
val producer = new KafkaProducer[String, String](KafkaProducerConfigs().properties)
def produce(topic: String, messages: Iterable[String]): Unit = {
messages.foreach { m =>
println(s"Sending $topic and message is $m")
val result = producer.send(new ProducerRecord(topic, m)).get()
println(s"the write status is ${result}")
}
producer.flush()
producer.close(10L, TimeUnit.MILLISECONDS)
}
}
Hope this helps someone.
Trying the new 0.9 Consumer API and Producer API. But just cant seem to get it working. I have a producer that produces 100 messages per partition of a topic with two paritions. My code reads like
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("acks", "1");
props.put("retries", 0);
props.put("batch.size", 2);
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
Producer<String, String> producer = new KafkaProducer<>(props);
System.out.println("Created producer");
for (int i = 0; i < 100; i++) {
for (int j = 0; j < 2; j++) {
producer.send(new ProducerRecord<>("burrow_test_2", j, "M_"+ i + "_" + j + "_msg",
"M_" + i + "_" + j + "_msg"));
System.out.println("Sending msg " + i + " into partition " + j);
}
System.out.println("Sent 200 msgs");
}
System.out.println("Closing producer");
producer.close();
System.out.println("Closed producer");
Now producer.close takes a really long time to close, post which I assume any buffers are flushed, and messages written to the tail of the log.
Now my consumer , I would like to read the specified number of messages before quitting. I chose to manually commit offset as I read. The code reads like
int noOfMessagesToRead = Integer.parseInt(args[0]);
String groupName = args[1];
System.out.println("Reading " + noOfMessagesToRead + " for group " + groupName);
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("group.id", groupName);
//props.put("enable.auto.commit", "true");
//props.put("auto.commit.interval.ms", "1000");
props.put("session.timeout.ms", "30000");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Arrays.asList("burrow_test_2"));
//consumer.seekToEnd(new TopicPartition("burrow_test_2", 0), new TopicPartition("burrow_test_2", 1));
int noOfMessagesRead = 0;
while (noOfMessagesRead != noOfMessagesToRead) {
System.out.println("Polling....");
ConsumerRecords<String, String> records = consumer.poll(0);
for (ConsumerRecord<String, String> record : records) {
System.out.printf("offset = %d, key = %s, value = %s", record.offset(), record.key(),
record.value());
noOfMessagesRead++;
}
consumer.commitSync();
}
}
But my consumer is always stuck on the poll call (never returns even though I have specified timeout as 0).
Now just to confirm, I tried consuming from command line console consumer provided by Kafka bin/kafka-console-consumer.sh --from-beginning --new-consumer --topic burrow_test_2 --bootstrap-server localhost:9092.
Now this seems to read only to only consume messages produced before I started the java producer.
So questions are
Whats the problem with Java producer above
Why is the consumer not polling anything (I have older messages that are being read just fine by console consumer).
UPDATE: My bad. I had port forwarding enabled from localhost to a remote machine running Kafka, ZK. When this was the case there seems to be some issue. Running on localhost seemed to produce the message.
My only remaining question is that with consumer API, I am unable to seekan offset. I tried both seekToEnd and seekToBeginning methods, both threw an exception saying no record found. So what is the way for consumer to seek an offset. Is auto.offset.reset consumer property the only option ?