how to find zkroot and clientid for SpoutConfig - apache-zookeeper

I'm tryingto connect to a remote kafka cluster in storm. I'm using the following code:
Broker brokerForPartition0 = new Broker("208.113.164.114:9091");
Broker brokerForPartition1 = new Broker("208.113.164.115:9092");
Broker brokerForPartition2 = new Broker("208.113.164.117:9093");
GlobalPartitionInformation partitionInfo = new GlobalPartitionInformation();
partitionInfo.addPartition(0, brokerForPartition2);//mapping from partition 0 to brokerForPartition0
partitionInfo.addPartition(1, brokerForPartition0);//mapping from partition 1 to brokerForPartition1
partitionInfo.addPartition(2, brokerForPartition1);//mapping from partition 2 to brokerForPartition2
StaticHosts hosts = new StaticHosts(partitionInfo);
SpoutConfig spoutConfig = new SpoutConfig(hosts, "newImageTest","/brokers","console-consumer-61818");
spoutConfig.scheme = new SchemeAsMultiScheme(new StringScheme());
KafkaSpout kafkaSpout = new KafkaSpout(spoutConfig);
In the instanciation of spoutConfig, I have to put as a parameters the zkroot and clientid.
java public SpoutConfig(BrokerHosts hosts, String topic, String zkRoot, String id);
Where can I find these two information? Or should I create them?
Thank you!

From this documentation,
Spoutconfig is an extension of KafkaConfig that supports additional
fields with ZooKeeper connection info and for controlling behavior
specific to KafkaSpout. The Zkroot will be used as root to store your
consumer's offset. The id should uniquely identify your spout.
Zkroot, therefore should be some ZNode path like /some/path which will be used to store your consumer's offset as mentioned.
id is some string, (say a UUID) which can be used to uniquely identify your spout as mentioned.

Related

Kafka consumer is very slow to consume data and only consuming first 500 records

I am trying to integrate MongoDB and Storm-Kafka, Kafka Producer produces data from MongoDB but it fails to fetch all records from Consumer side. It only consuming 500-600 records out of 1 million records.
There are no errors in log file, topology is still alive but not processing further records.
Kafka version :0.10.* Storm version :1.2.1
Do i need to add any configs in Consumer?
conf.put(Config.TOPOLOGY_BACKPRESSURE_ENABLE, false);
conf.put(Config.TOPOLOGY_MAX_SPOUT_PENDING, 2048);
conf.put(Config.TOPOLOGY_EXECUTOR_RECEIVE_BUFFER_SIZE, 16384);
conf.put(Config.TOPOLOGY_EXECUTOR_SEND_BUFFER_SIZE, 16384);
BrokerHosts hosts = new ZkHosts(zookeeperUrl);
SpoutConfig spoutConfig = new SpoutConfig(hosts, topic, zkRoot, consumerGroupId);
spoutConfig.scheme = new KeyValueSchemeAsMultiScheme(new StringKeyValueScheme());
spoutConfig.fetchSizeBytes = 25000000;
if (startFromBeginning) {
spoutConfig.startOffsetTime = OffsetRequest.EarliestTime();
} else {
spoutConfig.startOffsetTime = OffsetRequest.LatestTime();
}
return new KafkaSpout(spoutConfig);
}
I want Kafka spout should read all records from kafka topic which are produced by producer.

kafka consumer is not able to produce output

I have written kafka consumer in scala. When I run consumer it is showing blank on console.
I have used below code:
val topicProducer = "testOutput"
val props = new Properties()
props.put("bootstrap.servers","host:9092,host:9092")
props.put("key.deserializer","org.apache.kafka.common.serialization.StringDeserializer")
props.put("value.deserializer","org.apache.kafka.common.serialization.StringDeserializer")
props.put("group.id", "test");
val kafkaConsumer = new KafkaConsumer[String, String](props)
val topic = Array("test").toList
kafkaConsumer.subscribe(topic)
val results = kafkaConsumer.poll(2000)
for ((record) <- results) {
producer.send(new ProducerRecord(topicProducer,"key","Value="+record.key()+" Record Key="+record.value()+"append"))
}
You also need to specify auto.offset.reset property so that your consumer is able to consume the messages from the beginning (equivalent to --from-beginning in the command-line )
props.put("auto.offset.reset", "earliest");
According to Kafka docs:
auto.offset.reset
What to do when there is no initial offset in ZooKeeper or if an
offset is out of range:
smallest : automatically reset the offset to the smallest offset
largest : automatically reset the offset to the largest offset
anything else: throw exception to the consumer
EDIT:
Alternatively, if you are using the old consumer API then instead of bootstrap-server host:9092 use the zookeeper parameter --zookeeper host:2181 .
If this does not solve the issue then try to delete /brokers in zookeeper
bin/zookeeper-shell <zk-host>:2181
and restart the kafka nodes
rmr /brokers

How to change default kafka SpoutConfig class

I am getting the message stream of 3MB from kafka topic but the default value is 1MB. Now I have changed the kafka properties from 1MB to 3MB by adding the below lines in kafa consumer.properties and server.properties file.
fetch.message.max.bytes=2048576 ( consumer.properties )
filemessage.max.bytes=2048576 ( server.properties )
replica.fetch.max.bytes=2048576 ( server.properties )
Now after adding the above lines in Kafka, 3MB message data is going into kafka data logs. But STORM is unable to process that 3MB data and it is able to read only default size i.e.,1MB data.
So how to change those configurations inorder to process/read the 3MB data. Here is my topology class.
String argument = args[0];
Config conf = new Config();
conf.put(JDBC_CONF, map);
conf.setDebug(true);
conf.put(Config.TOPOLOGY_MAX_SPOUT_PENDING, 1);
//set the number of workers
conf.setNumWorkers(3);
TopologyBuilder builder = new TopologyBuilder();
//Setup Kafka spout
BrokerHosts hosts = new ZkHosts("localhost:2181");
String topic = "year1234";
String zkRoot = "";
String consumerGroupId = "group1";
SpoutConfig spoutConfig = new SpoutConfig(hosts, topic, zkRoot, consumerGroupId);
spoutConfig.scheme = new SchemeAsMultiScheme(new StringScheme());
KafkaSpout kafkaSpout = new KafkaSpout(spoutConfig);
builder.setSpout("KafkaSpout", kafkaSpout,1);
builder.setBolt("user_details", new Parserspout(),1).shuffleGrouping("KafkaSpout");
builder.setBolt("bolts_user", new bolts_user(cp),1).shuffleGrouping("user_details");
Add following lines below
SpoutConfig spoutConfig = new SpoutConfig(hosts, topic, zkRoot, consumerGroupId);
spoutConfig.fetchSizeBytes = 3048576;
spoutConfig.bufferSizeBytes = 3048576;

Multiple Streams in Trident Topology

I have multiple OpaqueTridentKafkaSpout reading from different Kafka topics. I want data from all these streams to go through same set of Functions. What is the best way to achieve that.
Do I need to create separate streams and pass each Tuple to same set of functions again. Like below?
BrokerHosts zk = new ZkHosts(getZooKeeperHosts());
TridentKafkaConfig spoutConf = new TridentKafkaConfig(zk, "Test");
spoutConf.scheme = new SchemeAsMultiScheme(new StringScheme());
TridentKafkaConfig spoutConf1 = new TridentKafkaConfig(zk, "Test1");
spoutConf1.scheme = new SchemeAsMultiScheme(new StringScheme());
OpaqueTridentKafkaSpout kafkaSpout1 = new OpaqueTridentKafkaSpout(spoutConf1);
topology.newStream("event", kafkaSpout).each(new Fields("document"), new ExtractDocumentInfo(), new Fields("id", "index", "type"));
topology.newStream("event1", kafkaSpout1).each(new Fields("document"), new ExtractDocumentInfo(), new Fields("id", "index", "type"));
You can merge the streams together, but any failure will cause both spouts to replay the batch.

Storm KafkaSpout can't read offset when topic partition more than 1

My KafkaSpout set is
SpoutConfig spoutConf = new SpoutConfig(brokerHosts, topic, zkRoot,clientId);
spoutConf.scheme = new SchemeAsMultiScheme(new StringScheme());
spoutConf.forceFromStart = false;
spoutConf.zkServers=...
spoutConf.zkPort = 2181;
spoutConf.zkHost = ...
spoutConf.zkRoot = zkRoot;
just like the Failing to write offset data to zookeeper in kafka-storm,
but I found,when my topic partition is bigger than 1, my KafkaSpout can't read offset from zookeeper and there are no settings in zookeeper.