Storm KafkaSpout can't read offset when topic partition more than 1 - offset

My KafkaSpout set is
SpoutConfig spoutConf = new SpoutConfig(brokerHosts, topic, zkRoot,clientId);
spoutConf.scheme = new SchemeAsMultiScheme(new StringScheme());
spoutConf.forceFromStart = false;
spoutConf.zkServers=...
spoutConf.zkPort = 2181;
spoutConf.zkHost = ...
spoutConf.zkRoot = zkRoot;
just like the Failing to write offset data to zookeeper in kafka-storm,
but I found,when my topic partition is bigger than 1, my KafkaSpout can't read offset from zookeeper and there are no settings in zookeeper.

Related

Kafka consumer is very slow to consume data and only consuming first 500 records

I am trying to integrate MongoDB and Storm-Kafka, Kafka Producer produces data from MongoDB but it fails to fetch all records from Consumer side. It only consuming 500-600 records out of 1 million records.
There are no errors in log file, topology is still alive but not processing further records.
Kafka version :0.10.* Storm version :1.2.1
Do i need to add any configs in Consumer?
conf.put(Config.TOPOLOGY_BACKPRESSURE_ENABLE, false);
conf.put(Config.TOPOLOGY_MAX_SPOUT_PENDING, 2048);
conf.put(Config.TOPOLOGY_EXECUTOR_RECEIVE_BUFFER_SIZE, 16384);
conf.put(Config.TOPOLOGY_EXECUTOR_SEND_BUFFER_SIZE, 16384);
BrokerHosts hosts = new ZkHosts(zookeeperUrl);
SpoutConfig spoutConfig = new SpoutConfig(hosts, topic, zkRoot, consumerGroupId);
spoutConfig.scheme = new KeyValueSchemeAsMultiScheme(new StringKeyValueScheme());
spoutConfig.fetchSizeBytes = 25000000;
if (startFromBeginning) {
spoutConfig.startOffsetTime = OffsetRequest.EarliestTime();
} else {
spoutConfig.startOffsetTime = OffsetRequest.LatestTime();
}
return new KafkaSpout(spoutConfig);
}
I want Kafka spout should read all records from kafka topic which are produced by producer.

kafka consumer is not able to produce output

I have written kafka consumer in scala. When I run consumer it is showing blank on console.
I have used below code:
val topicProducer = "testOutput"
val props = new Properties()
props.put("bootstrap.servers","host:9092,host:9092")
props.put("key.deserializer","org.apache.kafka.common.serialization.StringDeserializer")
props.put("value.deserializer","org.apache.kafka.common.serialization.StringDeserializer")
props.put("group.id", "test");
val kafkaConsumer = new KafkaConsumer[String, String](props)
val topic = Array("test").toList
kafkaConsumer.subscribe(topic)
val results = kafkaConsumer.poll(2000)
for ((record) <- results) {
producer.send(new ProducerRecord(topicProducer,"key","Value="+record.key()+" Record Key="+record.value()+"append"))
}
You also need to specify auto.offset.reset property so that your consumer is able to consume the messages from the beginning (equivalent to --from-beginning in the command-line )
props.put("auto.offset.reset", "earliest");
According to Kafka docs:
auto.offset.reset
What to do when there is no initial offset in ZooKeeper or if an
offset is out of range:
smallest : automatically reset the offset to the smallest offset
largest : automatically reset the offset to the largest offset
anything else: throw exception to the consumer
EDIT:
Alternatively, if you are using the old consumer API then instead of bootstrap-server host:9092 use the zookeeper parameter --zookeeper host:2181 .
If this does not solve the issue then try to delete /brokers in zookeeper
bin/zookeeper-shell <zk-host>:2181
and restart the kafka nodes
rmr /brokers

how to find zkroot and clientid for SpoutConfig

I'm tryingto connect to a remote kafka cluster in storm. I'm using the following code:
Broker brokerForPartition0 = new Broker("208.113.164.114:9091");
Broker brokerForPartition1 = new Broker("208.113.164.115:9092");
Broker brokerForPartition2 = new Broker("208.113.164.117:9093");
GlobalPartitionInformation partitionInfo = new GlobalPartitionInformation();
partitionInfo.addPartition(0, brokerForPartition2);//mapping from partition 0 to brokerForPartition0
partitionInfo.addPartition(1, brokerForPartition0);//mapping from partition 1 to brokerForPartition1
partitionInfo.addPartition(2, brokerForPartition1);//mapping from partition 2 to brokerForPartition2
StaticHosts hosts = new StaticHosts(partitionInfo);
SpoutConfig spoutConfig = new SpoutConfig(hosts, "newImageTest","/brokers","console-consumer-61818");
spoutConfig.scheme = new SchemeAsMultiScheme(new StringScheme());
KafkaSpout kafkaSpout = new KafkaSpout(spoutConfig);
In the instanciation of spoutConfig, I have to put as a parameters the zkroot and clientid.
java public SpoutConfig(BrokerHosts hosts, String topic, String zkRoot, String id);
Where can I find these two information? Or should I create them?
Thank you!
From this documentation,
Spoutconfig is an extension of KafkaConfig that supports additional
fields with ZooKeeper connection info and for controlling behavior
specific to KafkaSpout. The Zkroot will be used as root to store your
consumer's offset. The id should uniquely identify your spout.
Zkroot, therefore should be some ZNode path like /some/path which will be used to store your consumer's offset as mentioned.
id is some string, (say a UUID) which can be used to uniquely identify your spout as mentioned.

Kafka - Spring : kafka consumer read a message from topic based on offset

Is there a way to consume a message from Kafka topic based on offset.
I mean I have a offset id that I previously published in a topic. Now I need get a message from topic based on offset Id which I'm passing.
Using Java Kafka Consumer Library, However you have to know the partition number also.
KafkaConsumer<String, String> consumer = new KafkaConsumer<String, String (properties);
long desiredOffset = 10000;
TopicPartition partition = new TopicPartition("some-topic", 0);
consumer.assign(Arrays.asList(partition));
consumer.seek(partition, desiredOffset);
bool found= false;
while(found != true){
ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100));
for(ConsumerRecord<String,String> record: records){
if(record.offset() == desiredOffset){
System.out.println(record)
found= true;
break;
}
}
}
consumer.close();
Things to consider, the record with your desired offset can be deleted depending of the clean up policy configuration in your Kafka Topic. Remember Kafka is a stream platform. Read the message by offset only if you are debugging.
Simply use Kafka consumer with required parameters like
bootstrap-server : (comma separated server names : port no.)
topic : (topic name)
partition : (partition number)
offset : (offset value)
max-messages : (No. of maximum messages to consume)
sh kafka-console-consumer.sh --bootstrap-server server1:9092,server2:9092,server3:9092 --topic test_topic --partition 0 --offset 43212345 --max-messages 1

How to change default kafka SpoutConfig class

I am getting the message stream of 3MB from kafka topic but the default value is 1MB. Now I have changed the kafka properties from 1MB to 3MB by adding the below lines in kafa consumer.properties and server.properties file.
fetch.message.max.bytes=2048576 ( consumer.properties )
filemessage.max.bytes=2048576 ( server.properties )
replica.fetch.max.bytes=2048576 ( server.properties )
Now after adding the above lines in Kafka, 3MB message data is going into kafka data logs. But STORM is unable to process that 3MB data and it is able to read only default size i.e.,1MB data.
So how to change those configurations inorder to process/read the 3MB data. Here is my topology class.
String argument = args[0];
Config conf = new Config();
conf.put(JDBC_CONF, map);
conf.setDebug(true);
conf.put(Config.TOPOLOGY_MAX_SPOUT_PENDING, 1);
//set the number of workers
conf.setNumWorkers(3);
TopologyBuilder builder = new TopologyBuilder();
//Setup Kafka spout
BrokerHosts hosts = new ZkHosts("localhost:2181");
String topic = "year1234";
String zkRoot = "";
String consumerGroupId = "group1";
SpoutConfig spoutConfig = new SpoutConfig(hosts, topic, zkRoot, consumerGroupId);
spoutConfig.scheme = new SchemeAsMultiScheme(new StringScheme());
KafkaSpout kafkaSpout = new KafkaSpout(spoutConfig);
builder.setSpout("KafkaSpout", kafkaSpout,1);
builder.setBolt("user_details", new Parserspout(),1).shuffleGrouping("KafkaSpout");
builder.setBolt("bolts_user", new bolts_user(cp),1).shuffleGrouping("user_details");
Add following lines below
SpoutConfig spoutConfig = new SpoutConfig(hosts, topic, zkRoot, consumerGroupId);
spoutConfig.fetchSizeBytes = 3048576;
spoutConfig.bufferSizeBytes = 3048576;