Apache Kafka Default Encoder Not Working - apache-kafka

I am using Kafka 0.8 beta, and I am just trying to mess around with sending different objects, serializing them using my own encoder, and sending them to an existing broker configuration. For now I am trying to get just DefaultEncoder working.
I have the broker and everything setup and working for StringEncoder, but I am not able to get any other data type, including just pure byte[], to be sent and received by the broker.
My code for the Producer is:
import kafka.javaapi.producer.Producer;
import kafka.producer.KeyedMessage;
import kafka.producer.ProducerConfig;
import java.util.Date;
import java.util.Properties;
import java.util.Random;
public class ProducerTest {
public static void main(String[] args) {
long events = 5;
Random rnd = new Random();
rnd.setSeed(new Date().getTime());
Properties props = new Properties();
props.setProperty("metadata.broker.list", "localhost:9093,localhost:9094");
props.setProperty("serializer.class", "kafka.serializer.DefaultEncoder");
props.setProperty("partitioner.class", "example.producer.SimplePartitioner");
props.setProperty("request.required.acks", "1");
props.setProperty("producer.type", "async");
props.setProperty("batch.num.messages", "4");
ProducerConfig config = new ProducerConfig(props);
Producer<byte[], byte[]> producer = new Producer<byte[], byte[]>(config);
for (long nEvents = 0; nEvents < events; nEvents++) {
byte[] a = "Hello".getBytes();
byte[] b = "There".getBytes();
KeyedMessage<byte[], byte[]> data = new KeyedMessage<byte[], byte[]>("page_visits", a, b);
producer.send(data);
}
try {
Thread.sleep(5000);
} catch (InterruptedException e) {
e.printStackTrace();
}
producer.close();
}
}
I used the same SimplePartitioner as in the example given here, and replacing all the byte arrays by Strings and changing the serializer to kafka.serializer.StringEncoder works perfectly.
For reference, SimplePartitioner:
import kafka.producer.Partitioner;
import kafka.utils.VerifiableProperties;
public class SimplePartitioner implements Partitioner<String> {
public SimplePartitioner (VerifiableProperties props) {
}
public int partition(String key, int a_numPartitions) {
int partition = 0;
int offset = key.lastIndexOf('.');
if (offset > 0) {
partition = Integer.parseInt( key.substring(offset+1)) % a_numPartitions;
}
return partition;
}
}
What am I doing wrong?

The answer is that the partitioning class SimplePartitioner is applicable only for Strings. When I try to run the Producer asynchronously, it creates a separate thread that handles the encoding and partitioning before sending to the broker. This thread hits a roadblock when it realizes that SimplePartitioner works only for Strings, but because it's a separate thread, no Exceptions are thrown, and so the thread just exits without any indication of wrongdoing.
If we change the SimplePartitioner to accept byte[], for instance:
import kafka.producer.Partitioner;
import kafka.utils.VerifiableProperties;
public class SimplePartitioner implements Partitioner<byte[]> {
public SimplePartitioner (VerifiableProperties props) {
}
public int partition(byte[] key, int a_numPartitions) {
int partition = 0;
return partition;
}
}
This works perfectly now.

Related

Kafka: Consumer api: Regression test fails if runs in a group (sequentially)

I have implemented a kafka application using consumer api. And I have 2 regression tests implemented with stream api:
To test happy path: by producing data from the test ( into the input topic that the application is listening to) that will be consumed by the application and application will produce data (into the output topic ) that the test will consume and validate against expected output data.
To test error path: behavior is the same as above. Although this time application will produce data into output topic and test will consume from application's error topic and will validate against expected error output.
My code and the regression-test codes are residing under the same project under expected directory structure. Both time ( for both tests) data should have been picked up by the same listener at the application side.
The problem is :
When I am executing the tests individually (manually), each test is passing. However, If I execute them together but sequentially ( for example: gradle clean build ) , only first test is passing. 2nd test is failing after the test-side-consumer polling for data and after some time it gives up not finding any data.
Observation:
From debugging, it looks like, the 1st time everything works perfectly ( test-side and application-side producers and consumers). However, during the 2nd test it seems that application-side-consumer is not receiving any data ( It seems that test-side-producer is producing data, but can not say that for sure) and hence no data is being produced into the error topic.
What I have tried so far:
After investigations, my understanding is that we are getting into race conditions and to avoid that found suggestions like :
use #DirtiesContext(classMode = DirtiesContext.ClassMode.AFTER_EACH_TEST_METHOD)
Tear off broker after each test ( Please see the ".destry()" on brokers)
use different topic names for each test
I applied all of them and still could not recover from my issue.
I am providing the code here for perusal. Any insight is appreciated.
Code for 1st test (Testing error path):
#DirtiesContext(classMode = DirtiesContext.ClassMode.AFTER_EACH_TEST_METHOD)
#EmbeddedKafka(
partitions = 1,
controlledShutdown = false,
topics = {
AdapterStreamProperties.Constants.INPUT_TOPIC,
AdapterStreamProperties.Constants.ERROR_TOPIC
},
brokerProperties = {
"listeners=PLAINTEXT://localhost:9092",
"port=9092",
"log.dir=/tmp/data/logs",
"auto.create.topics.enable=true",
"delete.topic.enable=true"
}
)
public class AbstractIntegrationFailurePathTest {
private final int retryLimit = 0;
#Autowired
protected EmbeddedKafkaBroker embeddedFailurePathKafkaBroker;
//To produce data
#Autowired
protected KafkaTemplate<PreferredMediaMsgKey, SendEmailCmd> inputProducerTemplate;
//To read from output error
#Autowired
protected Consumer<PreferredMediaMsgKey, ErrorCmd> outputErrorConsumer;
//Service to execute notification-preference
#Autowired
protected AdapterStreamProperties projectProerties;
protected void subscribe(Consumer consumer, String topic, int attempt) {
try {
embeddedFailurePathKafkaBroker.consumeFromAnEmbeddedTopic(consumer, topic);
} catch (ComparisonFailure ex) {
if (attempt < retryLimit) {
subscribe(consumer, topic, attempt + 1);
}
}
}
}
.
#TestConfiguration
public class AdapterStreamFailurePathTestConfig {
#Autowired
private EmbeddedKafkaBroker embeddedKafkaBroker;
#Value("${spring.kafka.adapter.application-id}")
private String applicationId;
#Value("${spring.kafka.adapter.group-id}")
private String groupId;
//Producer of records that the program consumes
#Bean
public Map<String, Object> sendEmailCmdProducerConfigs() {
Map<String, Object> results = KafkaTestUtils.producerProps(embeddedKafkaBroker);
results.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,
AdapterStreamProperties.Constants.KEY_SERDE.serializer().getClass());
results.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,
AdapterStreamProperties.Constants.INPUT_VALUE_SERDE.serializer().getClass());
return results;
}
#Bean
public ProducerFactory<PreferredMediaMsgKey, SendEmailCmd> inputProducerFactory() {
return new DefaultKafkaProducerFactory<>(sendEmailCmdProducerConfigs());
}
#Bean
public KafkaTemplate<PreferredMediaMsgKey, SendEmailCmd> inputProducerTemplate() {
return new KafkaTemplate<>(inputProducerFactory());
}
//Consumer of the error output, generated by the program
#Bean
public Map<String, Object> outputErrorConsumerConfig() {
Map<String, Object> props = KafkaTestUtils.consumerProps(
applicationId, Boolean.TRUE.toString(), embeddedKafkaBroker);
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,
AdapterStreamProperties.Constants.KEY_SERDE.deserializer().getClass()
.getName());
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,
AdapterStreamProperties.Constants.ERROR_VALUE_SERDE.deserializer().getClass()
.getName());
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
return props;
}
#Bean
public Consumer<PreferredMediaMsgKey, ErrorCmd> outputErrorConsumer() {
DefaultKafkaConsumerFactory<PreferredMediaMsgKey, ErrorCmd> rpf =
new DefaultKafkaConsumerFactory<>(outputErrorConsumerConfig());
return rpf.createConsumer(groupId, "notification-failure");
}
}
.
#RunWith(SpringRunner.class)
#SpringBootTest(classes = AdapterStreamFailurePathTestConfig.class)
#ActiveProfiles(profiles = "errtest")
public class ErrorPath400Test extends AbstractIntegrationFailurePathTest {
#Autowired
private DataGenaratorForErrorPath400Test datagen;
#Mock
private AdapterHttpClient httpClient;
#Autowired
private ErroredEmailCmdDeserializer erroredEmailCmdDeserializer;
#Before
public void setup() throws InterruptedException {
Mockito.when(httpClient.callApi(Mockito.any()))
.thenReturn(
new GenericResponse(
400,
TestConstants.ERROR_MSG_TO_CHK));
Mockito.when(httpClient.createURI(Mockito.any(),Mockito.any(),Mockito.any())).thenCallRealMethod();
inputProducerTemplate.send(
projectProerties.getInputTopic(),
datagen.getKey(),
datagen.getEmailCmdToProduce());
System.out.println("producer: "+ projectProerties.getInputTopic());
subscribe(outputErrorConsumer , projectProerties.getErrorTopic(), 0);
}
#Test
public void testWithError() throws InterruptedException, InvalidProtocolBufferException, TextFormat.ParseException {
ConsumerRecords<PreferredMediaMsgKeyBuf.PreferredMediaMsgKey, ErrorCommandBuf.ErrorCmd> records;
List<ConsumerRecord<PreferredMediaMsgKeyBuf.PreferredMediaMsgKey, ErrorCommandBuf.ErrorCmd>> outputListOfErrors = new ArrayList<>();
int attempt = 0;
int expectedRecords = 1;
do {
records = KafkaTestUtils.getRecords(outputErrorConsumer);
records.forEach(outputListOfErrors::add);
attempt++;
} while (attempt < expectedRecords && outputListOfErrors.size() < expectedRecords);
//Verify the recipient event stream size
Assert.assertEquals(expectedRecords, outputListOfErrors.size());
//Validate output
}
#After
public void tearDown() {
outputErrorConsumer.close();
embeddedFailurePathKafkaBroker.destroy();
}
}
2nd test is almost the same in structure. Although this time the test-side-consumer is consuming from application-side-output-topic( instead of error topic). And I named the consumers,broker,producer,topics differently. Like :
#DirtiesContext(classMode = DirtiesContext.ClassMode.AFTER_EACH_TEST_METHOD)
#EmbeddedKafka(
partitions = 1,
controlledShutdown = false,
topics = {
AdapterStreamProperties.Constants.INPUT_TOPIC,
AdapterStreamProperties.Constants.OUTPUT_TOPIC
},
brokerProperties = {
"listeners=PLAINTEXT://localhost:9092",
"port=9092",
"log.dir=/tmp/data/logs",
"auto.create.topics.enable=true",
"delete.topic.enable=true"
}
)
public class AbstractIntegrationSuccessPathTest {
private final int retryLimit = 0;
#Autowired
protected EmbeddedKafkaBroker embeddedKafkaBroker;
//To produce data
#Autowired
protected KafkaTemplate<PreferredMediaMsgKey,SendEmailCmd> sendEmailCmdProducerTemplate;
//To read from output regular topic
#Autowired
protected Consumer<PreferredMediaMsgKey, NotifiedEmailCmd> ouputConsumer;
//Service to execute notification-preference
#Autowired
protected AdapterStreamProperties projectProerties;
protected void subscribe(Consumer consumer, String topic, int attempt) {
try {
embeddedKafkaBroker.consumeFromAnEmbeddedTopic(consumer, topic);
} catch (ComparisonFailure ex) {
if (attempt < retryLimit) {
subscribe(consumer, topic, attempt + 1);
}
}
}
}
Please let me know if I should provide any more information.,
"port=9092"
Don't use a fixed port; leave that out and the embedded broker will use a random port; the consumer configs are set up in KafkaTestUtils to point to the random port.
You shouldn't need to dirty the context after each test method - use a different group.id for each test and a different topic.
In my case the consumer was not closed properly. I had to do :
#After
public void tearDown() {
// shutdown hook to correctly close the streams application
Runtime.getRuntime().addShutdownHook(new Thread(ouputConsumer::close));
}
to resolve.

Can pyspark implement custom serialized objects

I plan to use kafaka to send data in pyspark. I find out through searching materials that I need to make a customized serialization of producer so that I can broadcast the object. But how can I realize this function in pyspark?
spark==2.2.1
This is how it is implemented in scala:
public class KafkaProducer implements Serializable {
public static final String METADATA_BROKER_LIST_KEY = "metadata.broker.list";
public static final String SERIALIZER_CLASS_KEY = "serializer.class";
public static final String SERIALIZER_CLASS_VALUE = "kafka.serializer.StringEncoder";
private static KafkaProducer instance = null;
private Producer producer;
private KafkaProducer(String brokerList) {
Preconditions.checkArgument(StringUtils.isNotBlank(brokerList), "kafka brokerList is blank...");
// set properties
Properties properties = new Properties();
properties.put(METADATA_BROKER_LIST_KEY, brokerList);
properties.put(SERIALIZER_CLASS_KEY, SERIALIZER_CLASS_VALUE);
properties.put("kafka.message.CompressionCodec", "1");
properties.put("client.id", "streaming-kafka-output");
ProducerConfig producerConfig = new ProducerConfig(properties);
this.producer = new Producer(producerConfig);
}
public static synchronized KafkaProducer getInstance(String brokerList) {
if (instance == null) {
instance = new KafkaProducer(brokerList);
System.out.println("初始化 kafka producer...");
}
return instance;
}
// 单条发送
public void send(KeyedMessage<String, String> keyedMessage) {
producer.send(keyedMessage);
}
// 批量发送
public void send(List<KeyedMessage<String, String>> keyedMessageList) {
producer.send(keyedMessageList);
}
public void shutdown() {
producer.close();
}
}
How does this work in pysaprk?
hope below example helps you:
from pyspark.context import SparkContext
from pyspark.serializers import MarshalSerializer
sc = SparkContext("local", "serialization app", serializer = MarshalSerializer())
print(sc.parallelize(list(range(1000))).map(lambda x: 2 * x).take(10))
sc.stop()
Note: you can use PickleSerializer or MarshalSerializer
MarshalSerializer is faster than PickleSerializer but supports fewer datatypes.

How to read all the records in a Kafka topic

I am using kafka : kafka_2.12-2.1.0, spring kafka on client side and have got stuck with an issue.
I need to load an in-memory map by reading all the existing messages within a kafka topic. I did this by starting a new consumer (with a unique consumer group id and setting the offset to earliest). Then I iterate over the consumer (poll method) to get all messages and stop when the consumer records become empty.
But I noticed that, when I start polling, the first few iterations return consumer records as empty and then it starts returning the actual records. Now this breaks my logic as our code thinks there are no records in the topic.
I have tried few other ways (like using offsets number) but haven't been able to come up with any solution, apart from keeping another record somewhere which tells me how many messages there are in the topic which needs to be read before I stop.
Any idea's please ?
To my understanding, what you are trying to achieve is to have a map constructed in your application based on the values that are already in a specific Topic.
For this task, instead of manually polling the topic, you can use Ktable in Kafka Streams DSL which will automatically construct a readable key-value store which is fault tolerant, replication enabled and automatically filled with new values.
You can do this simply by calling groupByKey on a stream and then using the aggregate.
KStreamBuilder builder = new KStreamBuilder();
KStream<String, Long> myKStream = builder.stream(Serdes.String(), Serdes.Long(), "topic_name");
KTable<String, Long> totalCount = myKStream.groupByKey().aggregate(this::initializer, this::aggregator);
(The actual code may vary depending on the kafka version, your configurations, etc..)
Read more about Kafka Stream concepts here
Then I iterate over the consumer (poll method) to get all messages and stop when the consumer records become empty
Kafka is a message streaming platform. Any data you stream is being updated continuously and you probably should not use it in a way that you expect the consuming to stop after a certain number of messages. How will you handle if a new message comes in after you stop the consumer?
Also the reason you are getting null records maybe probably related to records being in different partitions, etc..
What is your specific use case here?, There might be a good way to do it with the Kafka semantics itself.
You have to use 2 consumers one to load the offsets and another one to read all the records.
import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.common.PartitionInfo;
import org.apache.kafka.common.TopicPartition;
import org.apache.kafka.common.serialization.ByteArrayDeserializer;
import java.time.Duration;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Objects;
import java.util.Properties;
import java.util.Set;
import java.util.stream.Collectors;
public class KafkaRecordReader {
static final Map<String, Object> props = new HashMap<>();
static {
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, ByteArrayDeserializer.class);
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, ByteArrayDeserializer.class);
props.put(ConsumerConfig.CLIENT_ID_CONFIG, "sample-client");
}
public static void main(String[] args) {
final Map<TopicPartition, OffsetInfo> partitionOffsetInfos = getOffsets(Arrays.asList("world, sample"));
final List<ConsumerRecord<byte[], byte[]>> records = readRecords(partitionOffsetInfos);
System.out.println(partitionOffsetInfos);
System.out.println("Read : " + records.size() + " records");
}
private static List<ConsumerRecord<byte[], byte[]>> readRecords(final Map<TopicPartition, OffsetInfo> offsetInfos) {
final Properties readerProps = new Properties();
readerProps.putAll(props);
readerProps.put(ConsumerConfig.CLIENT_ID_CONFIG, "record-reader");
final Map<TopicPartition, Boolean> partitionToReadStatusMap = new HashMap<>();
offsetInfos.forEach((tp, offsetInfo) -> {
partitionToReadStatusMap.put(tp, offsetInfo.beginOffset == offsetInfo.endOffset);
});
final List<ConsumerRecord<byte[], byte[]>> cachedRecords = new ArrayList<>();
try (final KafkaConsumer<byte[], byte[]> consumer = new KafkaConsumer<>(readerProps)) {
consumer.assign(offsetInfos.keySet());
for (final Map.Entry<TopicPartition, OffsetInfo> entry : offsetInfos.entrySet()) {
consumer.seek(entry.getKey(), entry.getValue().beginOffset);
}
boolean close = false;
while (!close) {
final ConsumerRecords<byte[], byte[]> consumerRecords = consumer.poll(Duration.ofMillis(100));
for (final ConsumerRecord<byte[], byte[]> record : consumerRecords) {
cachedRecords.add(record);
final TopicPartition currentTp = new TopicPartition(record.topic(), record.partition());
if (record.offset() + 1 == offsetInfos.get(currentTp).endOffset) {
partitionToReadStatusMap.put(currentTp, true);
}
}
boolean done = true;
for (final Map.Entry<TopicPartition, Boolean> entry : partitionToReadStatusMap.entrySet()) {
done &= entry.getValue();
}
close = done;
}
}
return cachedRecords;
}
private static Map<TopicPartition, OffsetInfo> getOffsets(final List<String> topics) {
final Properties offsetReaderProps = new Properties();
offsetReaderProps.putAll(props);
offsetReaderProps.put(ConsumerConfig.CLIENT_ID_CONFIG, "offset-reader");
final Map<TopicPartition, OffsetInfo> partitionOffsetInfo = new HashMap<>();
try (final KafkaConsumer<byte[], byte[]> consumer = new KafkaConsumer<>(offsetReaderProps)) {
final List<PartitionInfo> partitionInfos = new ArrayList<>();
topics.forEach(topic -> partitionInfos.addAll(consumer.partitionsFor("sample")));
final Set<TopicPartition> topicPartitions = partitionInfos
.stream()
.map(x -> new TopicPartition(x.topic(), x.partition()))
.collect(Collectors.toSet());
consumer.assign(topicPartitions);
final Map<TopicPartition, Long> beginningOffsets = consumer.beginningOffsets(topicPartitions);
final Map<TopicPartition, Long> endOffsets = consumer.endOffsets(topicPartitions);
for (final TopicPartition tp : topicPartitions) {
partitionOffsetInfo.put(tp, new OffsetInfo(beginningOffsets.get(tp), endOffsets.get(tp)));
}
}
return partitionOffsetInfo;
}
private static class OffsetInfo {
private final long beginOffset;
private final long endOffset;
private OffsetInfo(long beginOffset, long endOffset) {
this.beginOffset = beginOffset;
this.endOffset = endOffset;
}
#Override
public String toString() {
return "OffsetInfo{" +
"beginOffset=" + beginOffset +
", endOffset=" + endOffset +
'}';
}
#Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
OffsetInfo that = (OffsetInfo) o;
return beginOffset == that.beginOffset &&
endOffset == that.endOffset;
}
#Override
public int hashCode() {
return Objects.hash(beginOffset, endOffset);
}
}
}
Adding to the above answer from #arshad, the reason you are not getting the records is because you have already read them. See this answer here using earliest or latest does not matter on the consumer after you have a committed offset for the partition
I would use a seek to the beginning or the particular offset if you knew the starting offset.

Storm Kafka Topolgy terminates without output

This is the StBolt.java class.
package com.storm.cassandra;
import java.util.Map;
import net.sf.json.JSONObject;
import net.sf.json.JSONSerializer;
import org.apache.log4j.Logger;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.BasicOutputCollector;
import org.apache.storm.topology.IBasicBolt;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Tuple;
import com.datastax.driver.core.Cluster;
import com.datastax.driver.core.Row;
import com.datastax.driver.core.Session;
public class StBolt implements IBasicBolt {
private static final long serialVersionUID = 1L;
private static final Logger logger = Logger
.getLogger(StBolt.class);
private static Session session = null;
private Cluster cluster = null;
String cassandraURL;
JSONObject eventJson = null;
String topicname = null;
String ip = null;
String menu = null;
String product = null;
Row row = null;
com.datastax.driver.core.ResultSet viewcount = null;
com.datastax.driver.core.ResultSet segmentlistResult = null;
com.datastax.driver.core.ResultSet newCountUpdatedResult = null;
public StBolt(String topicname) {
this.topicname = topicname;
}
public void prepare(Map stormConf, TopologyContext topologyContext) {
cluster = Cluster.builder().addContactPoint("127.0.0.1").build();
System.out.println("load cassandra ip");
session = cluster.connect();
System.out.println("CassandraCounterBolt prepare method ended");
}
public void execute(Tuple input, BasicOutputCollector collector) {
System.out.println("Execute");
Fields fields = input.getFields();
try {
eventJson = (JSONObject) JSONSerializer.toJSON((String) input
.getValueByField(fields.get(0)));
topicname = (String) eventJson.get("topicName");
ip = (String) eventJson.get("ip");
menu = (String) eventJson.get("menu");
product = (String) eventJson.get("product");
String ievent = "ievent";
String install = "install";
viewcount = session
.execute("update webapp.viewcount set count=count+1 where topicname='"+topicname+
"'and ip= '"+ip+"'and menu='"+menu+"'and product='"+product+"'" );
} catch (Exception e) {
e.printStackTrace();
}
}
public void declareOutputFields(OutputFieldsDeclarer declarer) {
}
public Map<String, Object> getComponentConfiguration() {
return null;
}
public void cleanup() {
}
}
Here is the StTopology.java class
package com.storm.cassandra;
import org.apache.storm.kafka.BrokerHosts;
import org.apache.storm.kafka.KafkaSpout;
import org.apache.storm.kafka.SpoutConfig;
import org.apache.storm.kafka.StringScheme;
import org.apache.storm.kafka.ZkHosts;
import org.apache.storm.Config;
import org.apache.storm.LocalCluster;
import org.apache.storm.StormSubmitter;
import org.apache.storm.spout.SchemeAsMultiScheme;
import org.apache.storm.topology.TopologyBuilder;
public class StTopology {
public static void main(String[] args) throws Exception {
if (args.length == 4) {
BrokerHosts hosts = new ZkHosts("localhost:2181");
//System.out
//.println("Insufficent Arguements - topologyName kafkaTopic ZKRoot ID");
SpoutConfig kafkaConf1 = new SpoutConfig(hosts, args[1], args[2],
args[3]);
//System.out
//.println("Insufficent Arguements - topologyName kafkaTopic ZKRoot ID");
//kafkaConf1.forceFromStart = false;
kafkaConf1.zkRoot = args[2];
kafkaConf1.scheme = new SchemeAsMultiScheme(new StringScheme());
KafkaSpout kafkaSpout1 = new KafkaSpout(kafkaConf1);
StBolt countbolt = new StBolt(args[1]);
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("kafkaspout", kafkaSpout1, 1);
builder.setBolt("counterbolt", countbolt, 1).shuffleGrouping(
"kafkaspout");
Config config = new Config();
config.setDebug(true);
config.put(Config.TOPOLOGY_TRIDENT_BATCH_EMIT_INTERVAL_MILLIS, 1);
config.setNumWorkers(1);
LocalCluster cluster = new LocalCluster();
cluster.submitTopology(args[0], config, builder.createTopology());
// StormSubmitter.submitTopology(args[0], config,
// builder.createTopology());
} else {
System.out
.println("Insufficent Arguements - topologyName kafkaTopic ZKRoot ID");
}
}
}
I am trying to get JSON data from the Kafka console producer, process it in Storm and store it into Cassandra.
For some reason, there is no response from the bolt when I run the code with parameters viewcount usercount /kafkastorm webapp1.
I have Kafka getting data from the console producer as topic usercount, and the correct table in Cassandra.
The code compiles and runs without any error but the console shows terminated.
I have no activity anywhere, despite providing the right JSON input to the Kafka console producer multiple times {"topicname":"usercount","ip":"127.0.0.1","menu":"dress","product":"tshirt"}.
There is no topology shown as being created in the Storm UI's Topology Summary either.
I believe I have all the Kafka, Storm and Cassandra dependencies in place.
Please point me in the right direction with this issue. Thanks.

apache kafka throwing an exception for scala

I am trying to compile and run a simple kafka code that is a sample from Aapche.When compiling I am getting the following exception, even after adding all the lib files for scala (i guess).
Exception in thread "main" java.lang.NullPointerException
at scala.Predef$.Integer2int(Predef.scala:303)
at kafka.client.ClientUtils$$anonfun$parseBrokerList$1.apply(ClientUtils.scala:103)
at kafka.client.ClientUtils$$anonfun$parseBrokerList$1.apply(ClientUtils.scala:102)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:194)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:194)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:44)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:194)
at scala.collection.mutable.ArrayBuffer.map(ArrayBuffer.scala:44)
at kafka.client.ClientUtils$.parseBrokerList(ClientUtils.scala:102)
at kafka.producer.BrokerPartitionInfo.<init>(BrokerPartitionInfo.scala:32)
at kafka.producer.async.DefaultEventHandler.<init>(DefaultEventHandler.scala:41)
at kafka.producer.Producer.<init>(Producer.scala:60)
at kafka.javaapi.producer.Producer.<init>(Producer.scala:26)
at kafkaTest.TestProducer.main(TestProducer.java:23)
This is my program:
package kafkaTest;
import java.util.*;
import kafka.javaapi.producer.Producer;
import kafka.producer.KeyedMessage;
import kafka.producer.ProducerConfig;
public class TestProducer {
public static void main(String[] args) {
// long events = Long.parseLong(args[0]);
long events = 10l;
Random rnd = new Random();
Properties props = new Properties();
props.put("metadata.broker.list", "broker1:9092,broker2:9092 ");
props.put("serializer.class", "kafka.serializer.StringEncoder");
***![props.put("partitioner.class", "kafkaTest.SimplePartitioner");][1]***//this is line no 23
props.put("request.required.acks", "1");
ProducerConfig config = new ProducerConfig(props);
Producer<String, String> producer = new Producer<String, String>(config);
for (long nEvents = 0; nEvents < events; nEvents++) { long runtime =
new Date().getTime(); String ip = "192.168.2.1" + rnd.nextInt(255);
String msg = runtime + ",www.example.com," + ip; KeyedMessage<String,
String> data = new KeyedMessage<String, String>("page_visits", ip,
msg); producer.send(data); }
producer.close();
}
}
The attached is the screen shot of library files.
Please let me know the cause of error/exception.
Edit: this is SimplePartitioner.java
package kafkaTest;
import kafka.producer.Partitioner;
import kafka.utils.VerifiableProperties;
public class SimplePartitioner implements Partitioner {
public SimplePartitioner(VerifiableProperties props) {
}
public int partition(Object key, int a_numPartitions) {
int partition = 0;
String stringKey = (String) key;
int offset = stringKey.lastIndexOf('.');
if (offset > 0) {
partition = Integer.parseInt(stringKey.substring(offset + 1))
% a_numPartitions;
}
return partition;
}
}
There's a space at the end of your broker list :
props.put("metadata.broker.list", "broker1:9092,broker2:9092 ");
Remove it and it should work fine then :
props.put("metadata.broker.list", "broker1:9092,broker2:9092");
I also got this error when metadata.broker.list has a broker with no port number.