KafkaSpout (idle) generates a huge network traffic - apache-kafka

After developing and executing my Storm (1.0.1) topology with a KafkaSpout and a couple of Bolts, I noticed a huge network traffic even when the topology is idle (no message on Kafka, no processing is done in bolts). So I started to comment out my topology piece by piece in order to find the cause and now I have only the KafkaSpout in my main:
....
final SpoutConfig spoutConfig = new SpoutConfig(
new ZkHosts(zkHosts, "/brokers"),
"files-topic", // topic
"/kafka", // ZK chroot
"consumer-group-name");
spoutConfig.scheme = new SchemeAsMultiScheme(new StringScheme());
spoutConfig.startOffsetTime = OffsetRequest.LatestTime();
topologyBuilder.setSpout(
"kafka-spout-id,
new KafkaSpout(config),
1);
....
When this (useless) topology executes, even in local mode, even the very first time, the network traffic always grows a lot: I see (in my Activity Monitor)
An average of 432 KB of data received/sec
After a couple of hours the topology is running (idle) data received is 1.26GB and data sent is 1GB
(Important: Kafka is not running in cluster, a single instance that runs in the same machine with a single topic and a single partition. I just downloaded Kafka on my machine, started it and created a simple topic. When I put a message in the topic, everything in the topology is working without any problem at all)
Obviously, the reason is in the KafkaSpout.nextTuple() method (below), but I don't understand why, without any message in Kafka, I should have such traffic. Is there something I didn't consider? Is that the expected behaviour? I had a look at Kafka logs, ZK logs, nothing, I have cleaned up Kafka and ZK data, nothing, still the same behaviour.
#Override
public void nextTuple() {
List<PartitionManager> managers = _coordinator.getMyManagedPartitions();
for (int i = 0; i < managers.size(); i++) {
try {
// in case the number of managers decreased
_currPartitionIndex = _currPartitionIndex % managers.size();
EmitState state = managers.get(_currPartitionIndex).next(_collector);
if (state != EmitState.EMITTED_MORE_LEFT) {
_currPartitionIndex = (_currPartitionIndex + 1) % managers.size();
}
if (state != EmitState.NO_EMITTED) {
break;
}
} catch (FailedFetchException e) {
LOG.warn("Fetch failed", e);
_coordinator.refresh();
}
}
long diffWithNow = System.currentTimeMillis() - _lastUpdateMs;
/*
As far as the System.currentTimeMillis() is dependent on System clock,
additional check on negative value of diffWithNow in case of external changes.
*/
if (diffWithNow > _spoutConfig.stateUpdateIntervalMs || diffWithNow < 0) {
commit();
}
}

Put a sleep for one second (1000ms) in the nextTuple() method and observe the traffic now, For example,
#Override
public void nextTuple() {
try {
Thread.sleep(1000);
} catch(Exception ex){
log.error("Ëxception while sleeping...",e);
}
List<PartitionManager> managers = _coordinator.getMyManagedPartitions();
for (int i = 0; i < managers.size(); i++) {
...
...
...
...
}
The reason is, kafka consumer works on the basis of pull methodology which means, consumers will pull data from kafka brokers. So in consumer point of view (Kafka Spout) will do a fetch request to the kafka broker continuously which is a TCP network request. So you are facing a huge statistics on the data packet sent/received. Though the consumer doesn't consumes any message, pull request and empty response also will get account into network data packet sent/received statistics. Your network traffic will be less if your sleeping time is high. There are also some network related configurations for the brokers and also for consumer. Doing the research on configuration may helps you. Hope it will helps you.

Is your bolt receiving messages ? Do your bolt inherits BaseRichBolt ?
Comment out that line m.fail(id.offset) in Kafaspout and check it out. If your bolt doesn't ack then your spout assumes that message is failed and try to replay the same message.
public void fail(Object msgId) {
KafkaMessageId id = (KafkaMessageId) msgId;
PartitionManager m = _coordinator.getManager(id.partition);
if (m != null) {
//m.fail(id.offset);
}
Also try halt the nextTuple() for few millis and check it out.
Let me know if it helps

Related

kafka asynchronous produce lost message

Try to follow the instruction on internet to achieve kafka asynchronous produce. Here is what my producer looks like:
import org.apache.kafka.clients.producer.Producer;
import org.apache.kafka.clients.producer.ProducerRecord;
public void asynSend(String topic, Integer partition, String message) {
ProducerRecord<Object, Object> data = new ProducerRecord<>(topic, partition,null, message);
producer.send(data, new DefaultProducerCallback());
}
private static class DefaultProducerCallback implements Callback {
#Override
public void onCompletion(RecordMetadata recordMetadata, Exception e) {
if (e != null) {
logger.error("Asynchronous produce failed");
}
}
}
And I produce in a for loop like this:
for (int i = 0; i < 5000; i++) {
int partition = i % 2;
FsProducerFactory.getInstance().asynSend(topic, partition,i + "th message to partition " + partition);
}
However, some message may get lost. As shown below, message from 4508 to 4999 not delivered.
I find the reason might be the shutdown of producer process and all message in cache not send at that time would be lost.
Add this line after for loop would solve this problem:
producer.flush();
However, I am not sure whether it is a charm solution because I notice someone mentioned that flush would make Asynchronous send somehow Synchronous, can anyone explain or help me improve it?
In the book Kafka - The definitive Guide there is an example for an asznchronous Producer given exactly as you have written the code. It uses send together with a Callback.
In a discussion it is written:
Adding flush() before exiting will make the client wait for any outstanding messages to be delivered to the broker (and this will be around queue.buffering.max.ms, plus latency).
If you add flush() after each produce() call you are effectively implementing a sync producer.
But if you do it after the for loop it is not synchronous anymore but rather asynchronous.
What you could do also do is to set the acks in the Producer configuration to all. That way you will have some more guarantees to successfully produce messages in case the replication of the topic is set to greater than 1.

Kafka Consumer Poll runs indefinitely and doesn't return anything

I am facing difficulty with KafkaConsumer.poll(duration timeout), wherein it runs indefinitely and never come out of the method. Understand that this could be related to connection and I have seen it a bit inconsistent sometimes. How do I handle this should poll stops responding? Given below is the snippet from KafkaConsumer.poll()
public ConsumerRecords<K, V> poll(final Duration timeout) {
return poll(time.timer(timeout), true);
}
and I am calling the above from here :
Duration timeout = Duration.ofSeconds(30);
while (true) {
final ConsumerRecords<recordID, topicName> records = consumer.poll(timeout);
System.out.println("record count is" + records.count());
}
I am getting the below error:
org.apache.kafka.common.errors.SerializationException: Error
deserializing key/value for partition at offset 2. If
needed, please seek past the record to continue consumption.
I stumbled upon some useful information while trying to fix the problem I was facing above. I will provide the piece of code which should be able to handle this, but before that it is important to know what causes this.
While producing or consuming message or data to Apache Kafka, we need schema structure to that message or data, in my case Avro schema. If there is a conflict of message being produced to Kafka that conflict with that message schema, it will have an effect on consumption.
Add below code in your consumer topic in the method where it consume records --
do remember to import below packages:
import org.apache.kafka.common.TopicPartition;
import org.jsoup.SerializationException;
try {
while (true) {
ConsumerRecords<String, GenericRecord> records = null;
try {
records = consumer.poll(10000);
} catch (SerializationException e) {
String s = e.getMessage().split("Error deserializing key/value
for partition ")[1].split(". If needed, please seek past the record to
continue consumption.")[0];
String topics = s.split("-")[0];
int offset = Integer.valueOf(s.split("offset ")[1]);
int partition = Integer.valueOf(s.split("-")[1].split(" at") .
[0]);
TopicPartition topicPartition = new TopicPartition(topics,
partition);
//log.info("Skipping " + topic + "-" + partition + " offset "
+ offset);
consumer.seek(topicPartition, offset + 1);
}
for (ConsumerRecord<String, GenericRecord> record : records) {
System.out.printf("value = %s \n", record.value());
}
}
} finally {
consumer.close();
}
I ran into this while setting up a test environment.
Running the following command on the broker printed out the stored records as one would expect:
bin/kafka-console-consumer.sh --bootstrap-server="localhost:9092" --topic="foo" --from-beginning
It turned out that the Kafka server was misconfigured. To connect from an external
IP address listeners must have a valid value in kafka/config/server.properties, e.g.
# The address the socket server listens on. It will get the value returned from
# java.net.InetAddress.getCanonicalHostName() if not configured.
# FORMAT:
# listeners = listener_name://host_name:port
# EXAMPLE:
# listeners = PLAINTEXT://your.host.name:9092
listeners=PLAINTEXT://:9092

Return from Kafka consumer when there is no message

I want to process a topic in application startup using Confluent dotnet client. Assume following example:
while (true)
{
try
{
var cr = c.Consume();
Console.WriteLine($"Consumed message '{cr.Value}' at: '{cr.TopicPartitionOffset}'.");
}
catch (ConsumeException e)
{
Console.WriteLine($"Error occured: {e.Error.Reason}");
}
}
When there is no new message in Kafka, c.Consume will be blocked. Because I want to use it for application startup (Like cache warm up) I want to proceed my code when I found there is no new message.
I know there is an overload for setting timeout like c.Consume(timeout) but the problem with this approach is that if you have a message in your topic and the time duration of reading the message was more than your timeout, You receive null output which is not desirable.
The consumer(s) is not supposed to be aware of the producer(s).
Now if you want to know that you have read everything in the topic from the moment you start to consume, you can:
Load the newest offset before starting to consume.
Then start consuming messages.
If the message's offset is the same as the newest offset you loaded before, stop consuming.
I'm not a C# developper but from what I read in the dotnet confluent doc you can call QueryWatermarkOffsetson the consumer to get oldest and newest offset. https://docs.confluent.io/current/clients/confluent-kafka-dotnet/api/Confluent.Kafka.Consumer.html#Confluent_Kafka_Consumer_QueryWatermarkOffsets_Confluent_Kafka_TopicPartition_
And then, on the Messageclass you have an Offset accessor. So the whole thing should not be too hard to achieve.
https://docs.confluent.io/current/clients/confluent-kafka-dotnet/api/Confluent.Kafka.Message.html#Confluent_Kafka_Message_Offset
You can use OnPartitionEOF event that indicates you have reached the end of partition.
CancellationTokenSource source = new CancellationTokenSource();
bool isContinue = true;
c.OnPartitionEOF += (o, e) =>
{
Console.WriteLine($"You have reached end of partition");
isContinue = false;
source.Cancel();
};
while (isContinue)
{
try
{
var cr = c.Consume(source.Token);
Console.WriteLine($"Consumed message '{cr.Value}' at: '{cr.TopicPartitionOffset}'.");
}
catch (ConsumeException e)
{
Console.WriteLine($"Error occured: {e.Error.Reason}");
}
}
I found the Consumer.IsPartitionEOF useful.

How to check if Kafka Consumer is ready

I have Kafka commit policy set to latest and missing first few messages. If I give a sleep of 20 seconds before starting to send the messages to the input topic, everything is working as desired. I am not sure if the problem is with consumer taking long time for partition rebalancing. Is there a way to know if the consumer is ready before starting to poll ?
You can use consumer.assignment(), it will return set of partitions and verify whether all of the partitions are assigned which are available for that topic.
If you are using spring-kafka project, you can include spring-kafka-test dependancy and use below method to wait for topic assignment , but you need to have container.
ContainerTestUtils.waitForAssignment(Object container, int partitions);
You can do the following:
I have a test that reads data from kafka topic.
So you can't use KafkaConsumer in multithread environment, but you can pass parameter "AtomicReference assignment", update it in consumer-thread, and read it in another thread.
For example, snipped of working code in project for testing:
private void readAvro(String readFromKafka,
AtomicBoolean needStop,
List<Event> events,
String bootstrapServers,
int readTimeout) {
// print the topic name
AtomicReference<Set<TopicPartition>> assignment = new AtomicReference<>();
new Thread(() -> readAvro(bootstrapServers, readFromKafka, needStop, events, readTimeout, assignment)).start();
long startTime = System.currentTimeMillis();
long maxWaitingTime = 30_000;
for (long time = System.currentTimeMillis(); System.currentTimeMillis() - time < maxWaitingTime;) {
Set<TopicPartition> assignments = Optional.ofNullable(assignment.get()).orElse(new HashSet<>());
System.out.println("[!kafka-consumer!] Assignments [" + assignments.size() + "]: "
+ assignments.stream().map(v -> String.valueOf(v.partition())).collect(Collectors.joining(",")));
if (assignments.size() > 0) {
break;
}
try {
Thread.sleep(1_000);
} catch (InterruptedException e) {
e.printStackTrace();
needStop.set(true);
break;
}
}
System.out.println("Subscribed! Wait summary: " + (System.currentTimeMillis() - startTime));
}
private void readAvro(String bootstrapServers,
String readFromKafka,
AtomicBoolean needStop,
List<Event> events,
int readTimeout,
AtomicReference<Set<TopicPartition>> assignment) {
KafkaConsumer<String, byte[]> consumer = (KafkaConsumer<String, byte[]>) queueKafkaConsumer(bootstrapServers, "latest");
System.out.println("Subscribed to topic: " + readFromKafka);
consumer.subscribe(Collections.singletonList(readFromKafka));
long started = System.currentTimeMillis();
while (!needStop.get()) {
assignment.set(consumer.assignment());
ConsumerRecords<String, byte[]> records = consumer.poll(1_000);
events.addAll(CommonUtils4Tst.readEvents(records));
if (readTimeout == -1) {
if (events.size() > 0) {
break;
}
} else if (System.currentTimeMillis() - started > readTimeout) {
break;
}
}
needStop.set(true);
synchronized (MainTest.class) {
MainTest.class.notifyAll();
}
consumer.close();
}
P.S.
needStop - global flag, to stop all running thread if any in case of failure of success
events - list of object, that i want to check
readTimeout - how much time we will wait until read all data, if readTimeout == -1, then stop when we read anything
Thanks to Alexey (I have also voted up), I seemed to have resolved my issue essentially following the same idea.
Just want to share my experience... in our case we using Kafka in request & response way, somewhat like RPC. Request is being sent on one topic and then waiting for response on another topic. Running into a similar issue i.e. missing out first response.
I have tried ... KafkaConsumer.assignment(); repeatedly (with Thread.sleep(100);) but doesn't seem to help. Adding a KafkaConsumer.poll(50); seems to have primed the consumer (group) and receiving the first response too. Tested few times and it consistently working now.
BTW, testing requires stopping application & deleting Kafka topics and, for a good measure, restarted Kafka too.
PS: Just calling poll(50); without assignment(); fetching logic, like Alexey mentioned, may not guarantee that consumer (group) is ready.
You can modify an AlwaysSeekToEndListener (listens only to new messages) to include a callback:
public class AlwaysSeekToEndListener<K, V> implements ConsumerRebalanceListener {
private final Consumer<K, V> consumer;
private Runnable callback;
public AlwaysSeekToEndListener(Consumer<K, V> consumer) {
this.consumer = consumer;
}
public AlwaysSeekToEndListener(Consumer<K, V> consumer, Runnable callback) {
this.consumer = consumer;
this.callback = callback;
}
#Override
public void onPartitionsRevoked(Collection<TopicPartition> partitions) {
}
#Override
public void onPartitionsAssigned(Collection<TopicPartition> partitions) {
consumer.seekToEnd(partitions);
if (callback != null) {
callback.run();
}
}
}
and subscribe with a latch callback:
CountDownLatch initLatch = new CountDownLatch(1);
consumer.subscribe(singletonList(topic), new AlwaysSeekToEndListener<>(consumer, () -> initLatch.countDown()));
initLatch.await(); // blocks until consumer is ready and listening
then proceed to start your producer.
If your policy is set to latest - which takes effect if there are no previously committed offsets - but you have no previously committed offsets, then you should not worry about 'missing' messages, because you're telling Kafka not to care about messages that were sent 'previously' to your consumers being ready.
If you care about 'previous' messages, you should set the policy to earliest.
In any case, whatever the policy, the behaviour you are seeing is transient, i.e. once committed offsets are saved in Kafka, on every restart the consumers will pick up where they left previoulsy
I needed to know if a kafka consumer was ready before doing some testing, so i tried with consumer.assignment(), but it only returned the set of partitions assigned, but there was a problem, with this i cannot see if this partitions assigned to the group had offset setted, so later when i tried to use the consumer it didn´t have offset setted correctly.
The solutions was to use committed(), this will give you the last commited offsets of the given partitions that you put in the arguments.
So you can do something like: consumer.committed(consumer.assignment())
If there is no partitions assigned yet it will return:
{}
If there is partitions assigned, but no offset yet:
{name.of.topic-0=null, name.of.topic-1=null}
But if there is partitions and offset:
{name.of.topic-0=OffsetAndMetadata{offset=5197881, leaderEpoch=null, metadata=''}, name.of.topic-1=OffsetAndMetadata{offset=5198832, leaderEpoch=null, metadata=''}}
With this information you can use something like:
consumer.committed(consumer.assignment()).isEmpty();
consumer.committed(consumer.assignment()).containsValue(null);
And with this information you can be sure that the kafka consumer is ready.

HornetQ Consumer Stops Receiving Messages After N hours

I'm using a HornetQ(v2.2.13) consumer, running standalone, to read off a Persistent topic published by a JBOSS server(7.1.1 final). All goes well for several hours (between 2-6) and then the consumer just stops receiving messages from the topic. From the log file on the server, I see data keeps getting pumped down the pipe but the consumer log file indicates the client stopped reading the data. I deduced that from the client saying the last time it read a message off the topic was 12:00:00 and the server log saying the last time it pushed a message to the topic was 14:00:00.
I've tried adjusting the HornetQ configs, but it doesn't seem to be working for a sustainable duration.
The code I use to communicate with the topic is as follows.
private TransportConfiguration getTC(String hostname) {
Map<String,Object> params = new HashMap<String, Object>();
params.put(TransportConstants.HOST_PROP_NAME, hostname);
params.put(TransportConstants.PORT_PROP_NAME, 5445);
TransportConfiguration tc = new TransportConfiguration(NettyConnectorFactory.class.getName(), params);
return tc;
}
private Topic createDestination(String destinationName) {
Topic topic = new HornetQTopic(destinationName);
return topic;
}
private HornetQConnectionFactory createCF(TransportConfiguration tc) {
HornetQConnectionFactory cf = HornetQJMSClient.createConnectionFactoryWithoutHA(JMSFactoryType .CF, tc);
return cf == null ? null : cf;
}
Code snippet that creates the session and starts it:
TransportConfiguration tc = this.getTC(this.hostname);
HornetQConnectionFactory cf = this.createCF(tc);
cf.setRetryInterval(4000);
cf.setReconnectAttempts(10);
cf.setConfirmationWindowSize(1000000);
Destination destination = this.createDestination(this.topicName);
logger.info("Starting Topic Connection");
try {
this.connection = cf.createConnection();
connection.start();
this.session = connection.createSession(transactional, ackMode);
MessageConsumer consumer = session.createConsumer(destination);
consumer.setMessageListener(this);
logger.info("Started topic connection");
} catch (Exception ex) {
ex.printStackTrace();
logger.error("EXCEPTION!");
}
You don't get any log about server getting disconnected on the server's side.
Did you try playing with client-failure-check-period and other ping parameters?
What about VM Settings?
How are you acknowledging the messages? I see that you created it as transaction. Are you sure you are committing the TX as you recieve messages?