Kafka consumer is very slow to consume data and only consuming first 500 records - apache-kafka

I am trying to integrate MongoDB and Storm-Kafka, Kafka Producer produces data from MongoDB but it fails to fetch all records from Consumer side. It only consuming 500-600 records out of 1 million records.
There are no errors in log file, topology is still alive but not processing further records.
Kafka version :0.10.* Storm version :1.2.1
Do i need to add any configs in Consumer?
conf.put(Config.TOPOLOGY_BACKPRESSURE_ENABLE, false);
conf.put(Config.TOPOLOGY_MAX_SPOUT_PENDING, 2048);
conf.put(Config.TOPOLOGY_EXECUTOR_RECEIVE_BUFFER_SIZE, 16384);
conf.put(Config.TOPOLOGY_EXECUTOR_SEND_BUFFER_SIZE, 16384);
BrokerHosts hosts = new ZkHosts(zookeeperUrl);
SpoutConfig spoutConfig = new SpoutConfig(hosts, topic, zkRoot, consumerGroupId);
spoutConfig.scheme = new KeyValueSchemeAsMultiScheme(new StringKeyValueScheme());
spoutConfig.fetchSizeBytes = 25000000;
if (startFromBeginning) {
spoutConfig.startOffsetTime = OffsetRequest.EarliestTime();
} else {
spoutConfig.startOffsetTime = OffsetRequest.LatestTime();
}
return new KafkaSpout(spoutConfig);
}
I want Kafka spout should read all records from kafka topic which are produced by producer.

Related

kafka consumer is not able to produce output

I have written kafka consumer in scala. When I run consumer it is showing blank on console.
I have used below code:
val topicProducer = "testOutput"
val props = new Properties()
props.put("bootstrap.servers","host:9092,host:9092")
props.put("key.deserializer","org.apache.kafka.common.serialization.StringDeserializer")
props.put("value.deserializer","org.apache.kafka.common.serialization.StringDeserializer")
props.put("group.id", "test");
val kafkaConsumer = new KafkaConsumer[String, String](props)
val topic = Array("test").toList
kafkaConsumer.subscribe(topic)
val results = kafkaConsumer.poll(2000)
for ((record) <- results) {
producer.send(new ProducerRecord(topicProducer,"key","Value="+record.key()+" Record Key="+record.value()+"append"))
}
You also need to specify auto.offset.reset property so that your consumer is able to consume the messages from the beginning (equivalent to --from-beginning in the command-line )
props.put("auto.offset.reset", "earliest");
According to Kafka docs:
auto.offset.reset
What to do when there is no initial offset in ZooKeeper or if an
offset is out of range:
smallest : automatically reset the offset to the smallest offset
largest : automatically reset the offset to the largest offset
anything else: throw exception to the consumer
EDIT:
Alternatively, if you are using the old consumer API then instead of bootstrap-server host:9092 use the zookeeper parameter --zookeeper host:2181 .
If this does not solve the issue then try to delete /brokers in zookeeper
bin/zookeeper-shell <zk-host>:2181
and restart the kafka nodes
rmr /brokers

KafkaSpout multithread or not

kafka 0.8.x doc shows how to multithread in kafka consumer:
Map<String, Integer> topicCountMap = new HashMap<String, Integer>();
topicCountMap.put(topic, new Integer(a_numThreads));
Map<String, List<KafkaStream<byte[], byte[]>>> consumerMap = consumer.createMessageStreams(topicCountMap);
List<KafkaStream<byte[], byte[]>> streams = consumerMap.get(topic);
// now launch all the threads
//
executor = Executors.newFixedThreadPool(a_numThreads);
// now create an object to consume the messages
//
int threadNumber = 0;
for (final KafkaStream stream : streams) {
executor.execute(new ConsumerTest(stream, threadNumber));
threadNumber++;
}
But KafkaSpout in storm seems to not multithread.
Maybe use multi task instead of multithread in KafkaSpout :
builder.setSpout(SqlCollectorTopologyDef.KAFKA_SPOUT_NAME, new KafkaSpout(spoutConfig), nThread);
Which one is better? Thanks
Since you mentioned Kafka 0.8.x, I am assuming the KafkaSpout you use is from storm-kafka other than storm-kafka-client.
The first code snippet is high-level consumer's API which could employ multiple threads to consume multiple partitions.
As for the kafka spout, it's probably the same, but Storm is using the low-level consumer, namely SimpleConsumer. However, there will be one SimpleConsumer instance created for each spout executor(task).

how to find zkroot and clientid for SpoutConfig

I'm tryingto connect to a remote kafka cluster in storm. I'm using the following code:
Broker brokerForPartition0 = new Broker("208.113.164.114:9091");
Broker brokerForPartition1 = new Broker("208.113.164.115:9092");
Broker brokerForPartition2 = new Broker("208.113.164.117:9093");
GlobalPartitionInformation partitionInfo = new GlobalPartitionInformation();
partitionInfo.addPartition(0, brokerForPartition2);//mapping from partition 0 to brokerForPartition0
partitionInfo.addPartition(1, brokerForPartition0);//mapping from partition 1 to brokerForPartition1
partitionInfo.addPartition(2, brokerForPartition1);//mapping from partition 2 to brokerForPartition2
StaticHosts hosts = new StaticHosts(partitionInfo);
SpoutConfig spoutConfig = new SpoutConfig(hosts, "newImageTest","/brokers","console-consumer-61818");
spoutConfig.scheme = new SchemeAsMultiScheme(new StringScheme());
KafkaSpout kafkaSpout = new KafkaSpout(spoutConfig);
In the instanciation of spoutConfig, I have to put as a parameters the zkroot and clientid.
java public SpoutConfig(BrokerHosts hosts, String topic, String zkRoot, String id);
Where can I find these two information? Or should I create them?
Thank you!
From this documentation,
Spoutconfig is an extension of KafkaConfig that supports additional
fields with ZooKeeper connection info and for controlling behavior
specific to KafkaSpout. The Zkroot will be used as root to store your
consumer's offset. The id should uniquely identify your spout.
Zkroot, therefore should be some ZNode path like /some/path which will be used to store your consumer's offset as mentioned.
id is some string, (say a UUID) which can be used to uniquely identify your spout as mentioned.

How to change default kafka SpoutConfig class

I am getting the message stream of 3MB from kafka topic but the default value is 1MB. Now I have changed the kafka properties from 1MB to 3MB by adding the below lines in kafa consumer.properties and server.properties file.
fetch.message.max.bytes=2048576 ( consumer.properties )
filemessage.max.bytes=2048576 ( server.properties )
replica.fetch.max.bytes=2048576 ( server.properties )
Now after adding the above lines in Kafka, 3MB message data is going into kafka data logs. But STORM is unable to process that 3MB data and it is able to read only default size i.e.,1MB data.
So how to change those configurations inorder to process/read the 3MB data. Here is my topology class.
String argument = args[0];
Config conf = new Config();
conf.put(JDBC_CONF, map);
conf.setDebug(true);
conf.put(Config.TOPOLOGY_MAX_SPOUT_PENDING, 1);
//set the number of workers
conf.setNumWorkers(3);
TopologyBuilder builder = new TopologyBuilder();
//Setup Kafka spout
BrokerHosts hosts = new ZkHosts("localhost:2181");
String topic = "year1234";
String zkRoot = "";
String consumerGroupId = "group1";
SpoutConfig spoutConfig = new SpoutConfig(hosts, topic, zkRoot, consumerGroupId);
spoutConfig.scheme = new SchemeAsMultiScheme(new StringScheme());
KafkaSpout kafkaSpout = new KafkaSpout(spoutConfig);
builder.setSpout("KafkaSpout", kafkaSpout,1);
builder.setBolt("user_details", new Parserspout(),1).shuffleGrouping("KafkaSpout");
builder.setBolt("bolts_user", new bolts_user(cp),1).shuffleGrouping("user_details");
Add following lines below
SpoutConfig spoutConfig = new SpoutConfig(hosts, topic, zkRoot, consumerGroupId);
spoutConfig.fetchSizeBytes = 3048576;
spoutConfig.bufferSizeBytes = 3048576;

Storm KafkaSpout can't read offset when topic partition more than 1

My KafkaSpout set is
SpoutConfig spoutConf = new SpoutConfig(brokerHosts, topic, zkRoot,clientId);
spoutConf.scheme = new SchemeAsMultiScheme(new StringScheme());
spoutConf.forceFromStart = false;
spoutConf.zkServers=...
spoutConf.zkPort = 2181;
spoutConf.zkHost = ...
spoutConf.zkRoot = zkRoot;
just like the Failing to write offset data to zookeeper in kafka-storm,
but I found,when my topic partition is bigger than 1, my KafkaSpout can't read offset from zookeeper and there are no settings in zookeeper.