state store may have migrated to another instance - apache-kafka

When I try to access state store from stream, I am getting below error.
The state store, count-store, may have migrated to another instance.
When I tried to access ReadOnlyKeyValueStore from store, getting error message as migrated to other server. but am having only one broker is up and running
/**
*
*/
package com.ms.kafka.com.ms.stream;
import java.util.Properties;
import java.util.stream.Stream;
import org.apache.kafka.common.serialization.Serdes;
import org.apache.kafka.streams.KafkaStreams;
import org.apache.kafka.streams.StreamsBuilder;
import org.apache.kafka.streams.StreamsConfig;
import org.apache.kafka.streams.errors.InvalidStateStoreException;
import org.apache.kafka.streams.kstream.Consumed;
import org.apache.kafka.streams.kstream.KStream;
import org.apache.kafka.streams.kstream.KTable;
import org.apache.kafka.streams.kstream.Materialized;
import org.apache.kafka.streams.state.QueryableStoreType;
import org.apache.kafka.streams.state.QueryableStoreTypes;
import org.apache.kafka.streams.state.ReadOnlyKeyValueStore;
import com.ms.kafka.com.ms.entity.TrackingEvent;
import com.ms.kafka.com.ms.entity.TrackingEventDeserializer;
import com.ms.kafka.com.ms.entity.TrackingEvnetSerializer;
/**
* #author vettri
*
*/
public class EventStreamer {
/**
*
*/
public EventStreamer() {
// TODO Auto-generated constructor stub
}
public static void main(String[] args) throws InterruptedException {
Properties props = new Properties();
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "trackeventstream_stream");
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(StreamsConfig.CLIENT_ID_CONFIG,"testappdi");
props.put("auto.offset.reset","earliest");
/*
* props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG,
* Serdes.String().getClass());
* props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG,
* Serdes.String().getClass());
*/
final StreamsBuilder builder = new StreamsBuilder();
final KStream<String , TrackingEvent> eventStream = builder.stream("rt_event_command_topic_stream",Consumed.with(Serdes.String(),
Serdes.serdeFrom(new TrackingEvnetSerializer(), new TrackingEventDeserializer())));
KTable<String, Long> groupedByUniqueId = eventStream.groupBy((k,v) -> v.getUniqueid()).
count(Materialized.as("count-store"));
/*
* KTable<Integer, Integer> table = builder.table( "rt_event_topic_stream",
* Materialized.as("queryable-store-name"));
*/
//eventStream.filter((k,v) -> "9de3b676-b20f-4b7a-878b-526fd5948a34".equalsIgnoreCase(v.getUniqueid())).foreach((k,v) -> System.out.println(v));
final KafkaStreams stream = new KafkaStreams(builder.build(), props);
stream.cleanUp();
stream.start();
System.out.println("Strema state : "+stream.state().name());
String queryableStoreName = groupedByUniqueId.queryableStoreName();
/*
* ReadOnlyKeyValueStore keyValStore1 =
* waitUntilStoreIsQueryable(queryableStoreName, (QueryableStoreTypes)
* QueryableStoreTypes.keyValueStore(),stream);
*/ ReadOnlyKeyValueStore<Long , TrackingEvent> keyValStore = stream.store(queryableStoreName, QueryableStoreTypes.<Long,TrackingEvent>keyValueStore());
// System.out.println("results --> "+keyValStore.get((long) 158));
//streams.close();
}
public static <T> T waitUntilStoreIsQueryable(final String storeName,
final QueryableStoreTypes queryableStoreType, final KafkaStreams streams) throws InterruptedException {
while (true) {
try {
return streams.store(storeName, (QueryableStoreType<T>) queryableStoreType);
} catch (InvalidStateStoreException ignored) {
// store not yet ready for querying
System.out.println("system is waitng to ready for state store");
Thread.sleep(100);
//streams.close();
}
}
}
}
I need to retrieve the data that i stored in state store,
what am trying to do is, need store it in local and retrievestrong text

In your case the local KafkaStreams instance is not yet ready and thus its local state stores cannot be queried yet.
Before querying you should wait for KafkaStreams to be in RUNNING status. You need call you waitUntilStoreIsQueryable(...).
Example can be found in Confluent github:
waitUntilStoreIsQueryable(...) definition
usage of waitUntilStoreIsQueryable(...)
More details regarding cause can be found here: https://docs.confluent.io/current/streams/faq.html#handling-invalidstatestoreexception-the-state-store-may-have-migrated-to-another-instance

Related

KTable not updating (immediately) when new messages are put on input stream

I have a kafka topic of Strings with an arbitrary key. I want to create a topic of characters in string : value pairs, e.g:
input("key","value") -> outputs (["v","value"],["a","value"],...)
To keep it simple, my input topic has a single partition, and thus the KTable code should be receiving all messages to a single instance.
I have created the following sandbox code, which builds the new table just fine , but doesn't update when a new item is put into the original topic:
import java.util.LinkedHashSet;
import java.util.Properties;
import java.util.Set;
import java.util.concurrent.TimeUnit;
import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.Producer;
import org.apache.kafka.clients.producer.ProducerConfig;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.common.serialization.Serdes;
import org.apache.kafka.common.serialization.StringSerializer;
import org.apache.kafka.common.utils.Bytes;
import org.apache.kafka.streams.KafkaStreams;
import org.apache.kafka.streams.KeyValue;
import org.apache.kafka.streams.StoreQueryParameters;
import org.apache.kafka.streams.StreamsBuilder;
import org.apache.kafka.streams.StreamsConfig;
import org.apache.kafka.streams.kstream.Consumed;
import org.apache.kafka.streams.kstream.Grouped;
import org.apache.kafka.streams.kstream.Materialized;
import org.apache.kafka.streams.state.KeyValueIterator;
import org.apache.kafka.streams.state.KeyValueStore;
import org.apache.kafka.streams.state.QueryableStoreTypes;
import org.apache.kafka.streams.state.ReadOnlyKeyValueStore;
public class Sandbox
{
private final static String kafkaBootstrapServers = "192.168.1.254:9092";
private final static String kafkaGlobalTablesDirectory = "C:\\Kafka\\tmp\\kafka-streams-global-tables\\";
private final static String topic = "sandbox";
private static KafkaStreams streams;
public static void main(String[] args)
{
// 1. set up the test data
Properties producerProperties = new Properties();
producerProperties.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, kafkaBootstrapServers);
producerProperties.put(ProducerConfig.CLIENT_ID_CONFIG, Sandbox.class.getName() + "_testProducer");
producerProperties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
producerProperties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
Producer<String, String> sandboxProducer = new KafkaProducer<>(producerProperties);
sandboxProducer.send(new ProducerRecord<String, String>(topic,"uvw","uvw"));
// 2. read the test data and check it's working
ReadOnlyKeyValueStore<String, String> store = getStore();
printStore(store.all());
System.out.println("-------------ADDING NEW VALUE----------------");
sandboxProducer.send(new ProducerRecord<String, String>(topic,"xyz","xyz"));
System.out.println("-------------ADDED NEW VALUE----------------");
printStore(store.all());
sandboxProducer.close();
streams.close();
}
private static void printStore(KeyValueIterator<String, String> i)
{
System.out.println("-------------PRINT START----------------");
while (i.hasNext())
{
KeyValue<String, String> n = i.next();
System.out.println(n.key + ":" + String.join(",", n.value));
}
System.out.println("-------------PRINT END----------------");
}
private static ReadOnlyKeyValueStore<String, String> getStore()
{
ReadOnlyKeyValueStore<String, String> store = null;
String storeString = "sandbox_store";
StreamsBuilder builder = new StreamsBuilder();
builder.stream(topic
, Consumed.with(Serdes.String(),Serdes.String()))
.filter((k,v)->v!=null)
.flatMap((k,v)->{
Set<KeyValue<String, String>> results = new LinkedHashSet<>();
if (v != null)
{
for (char subChar : v.toCharArray())
{
results.add(KeyValue.<String, String>pair(new String(new char[] {subChar}), v));
}
}
return results;
})
.groupByKey(Grouped.with(Serdes.String(), Serdes.String()))
.aggregate(()->new String()
, (key, value, agg) -> {
agg = agg + value;
return agg;
}
,Materialized.<String, String, KeyValueStore<Bytes, byte[]>>as(storeString)
.withKeySerde(Serdes.String())
.withValueSerde(Serdes.String()));
final Properties streamsConfiguration = new Properties();
streamsConfiguration.put(StreamsConfig.APPLICATION_ID_CONFIG, "sandbox");
streamsConfiguration.put(StreamsConfig.CLIENT_ID_CONFIG, Sandbox.class.getName());
streamsConfiguration.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, kafkaBootstrapServers);
streamsConfiguration.put(StreamsConfig.STATE_DIR_CONFIG, kafkaGlobalTablesDirectory + "Sandbox");
streamsConfiguration.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
streams = new KafkaStreams(builder.build(), streamsConfiguration);
streams.setUncaughtExceptionHandler((Thread thread, Throwable throwable) -> {
System.out.println("Exception on thread " + thread.getName() + ":" + throwable.getLocalizedMessage());
});
Runtime.getRuntime().addShutdownHook(new Thread(streams::close));
streams.cleanUp(); // clear any old streams data - forces a rebuild of the local caches.
streams.start(); // hangs until the global table is built
StoreQueryParameters<ReadOnlyKeyValueStore<String, String>> storeSqp
= StoreQueryParameters.fromNameAndType(storeString
,QueryableStoreTypes.<String, String>keyValueStore());
// this while loop gives time for Kafka Streams to start up properly before creating the store
while (store == null)
{
try {
TimeUnit.SECONDS.sleep(1);
store = streams.store(storeSqp);
System.out.println("Store " + storeString + " Created successfully.");
} catch (InterruptedException e) {
}
catch (Exception e) {
System.out.println("Exception creating store " + storeString + ". Will try again in 1 second. Message: " + e.getLocalizedMessage());
}
}
return store;
}
}
The output I am getting is as follows:
Store sandbox_store Created successfully.
-------------PRINT START----------------
u:uvw
v:uvw
w:uvw
-------------PRINT END----------------
-------------ADDING NEW VALUE----------------
-------------ADDED NEW VALUE----------------
-------------PRINT START----------------
u:uvw
v:uvw
w:uvw
-------------PRINT END----------------
Note that the xyz I added has gone missing!
(p.s. I know I could use reduce instead of aggregate, but in practise the new value would be a different type, not a string, so it wouldn't work for my actual use-case)
Now, if I add a 10 second pause before printing the second time; or if I then restart the Sandbox class without clearing the topic, the first xyz shows up then. So it's clearly that there's a time delay somewhere in the system. And in practise I'm dealing with 300mb+ of messages all going onto the input topic at once, once an hour; and so the delay is even longer than just a few seconds.
How can I help speed things up?

Can pyspark implement custom serialized objects

I plan to use kafaka to send data in pyspark. I find out through searching materials that I need to make a customized serialization of producer so that I can broadcast the object. But how can I realize this function in pyspark?
spark==2.2.1
This is how it is implemented in scala:
public class KafkaProducer implements Serializable {
public static final String METADATA_BROKER_LIST_KEY = "metadata.broker.list";
public static final String SERIALIZER_CLASS_KEY = "serializer.class";
public static final String SERIALIZER_CLASS_VALUE = "kafka.serializer.StringEncoder";
private static KafkaProducer instance = null;
private Producer producer;
private KafkaProducer(String brokerList) {
Preconditions.checkArgument(StringUtils.isNotBlank(brokerList), "kafka brokerList is blank...");
// set properties
Properties properties = new Properties();
properties.put(METADATA_BROKER_LIST_KEY, brokerList);
properties.put(SERIALIZER_CLASS_KEY, SERIALIZER_CLASS_VALUE);
properties.put("kafka.message.CompressionCodec", "1");
properties.put("client.id", "streaming-kafka-output");
ProducerConfig producerConfig = new ProducerConfig(properties);
this.producer = new Producer(producerConfig);
}
public static synchronized KafkaProducer getInstance(String brokerList) {
if (instance == null) {
instance = new KafkaProducer(brokerList);
System.out.println("初始化 kafka producer...");
}
return instance;
}
// 单条发送
public void send(KeyedMessage<String, String> keyedMessage) {
producer.send(keyedMessage);
}
// 批量发送
public void send(List<KeyedMessage<String, String>> keyedMessageList) {
producer.send(keyedMessageList);
}
public void shutdown() {
producer.close();
}
}
How does this work in pysaprk?
hope below example helps you:
from pyspark.context import SparkContext
from pyspark.serializers import MarshalSerializer
sc = SparkContext("local", "serialization app", serializer = MarshalSerializer())
print(sc.parallelize(list(range(1000))).map(lambda x: 2 * x).take(10))
sc.stop()
Note: you can use PickleSerializer or MarshalSerializer
MarshalSerializer is faster than PickleSerializer but supports fewer datatypes.

Exponential backoff with message order guarantee using spring-kafka

I'm trying to implement a Spring Boot-based Kafka consumer that has some very strong message delivery guarentees, even in a case of an error.
messages from a partition must be processed in order,
if message processing fails, the consumption of the particular partition should be suspended,
the processing should be retried with a backoff, until it succeeds.
Our current implementation fulfills these requirements:
#Bean
public KafkaListenerContainerFactory<ConcurrentMessageListenerContainer<String, String>> kafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory<String, String> factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory());
factory.setRetryTemplate(retryTemplate());
final ContainerProperties containerProperties = factory.getContainerProperties();
containerProperties.setAckMode(AckMode.MANUAL_IMMEDIATE);
containerProperties.setErrorHandler(errorHandler());
return factory;
}
#Bean
public RetryTemplate retryTemplate() {
final ExponentialBackOffPolicy backOffPolicy = new ExponentialBackOffPolicy();
backOffPolicy.setInitialInterval(1000);
backOffPolicy.setMultiplier(1.5);
final RetryTemplate template = new RetryTemplate();
template.setRetryPolicy(new AlwaysRetryPolicy());
template.setBackOffPolicy(backOffPolicy);
return template;
}
#Bean
public ErrorHandler errorHandler() {
return new SeekToCurrentErrorHandler();
}
However, here, the record is locked by the consumer forever. At some point, the processing time will exceed max.poll.interval.ms and the server will reassign the partition to some other consumer, thus creating a duplicate.
Assuming max.poll.interval.ms equal to 5 mins (default) and the failure lasting 30 mins, this will cause the message to be processed ca. 6 times.
Another possiblity is to return the messages to the queue after N retries (e.g. 3 attempts), by using SimpleRetryPolicy. Then, the message will be replayed (thanks to SeekToCurrentErrorHandler) and the processing will start from scratch, again up to 5 attempts. This results in delays forming a series e.g.
10 secs -> 30 secs -> 90 secs -> 10 secs -> 30 secs -> 90 secs -> ...
which is less desired than an constantly rising one :)
Is there any third scenario which could keep the delays forming an ascending series and, at the same time, not creating duplicates in the aforementioned example?
It can be done with stateful retry - in which case the exception is thrown after each retry, but state is maintained in the retry state object, so the next delivery of that message will use the next delay etc.
This requires something in the message (e.g. a header) to uniquely identify each message. Fortunately, with Kafka, the topic, partition and offset provide that unique key for the state.
However, currently, the RetryingMessageListenerAdapter does not support stateful retry.
You could disable retry in the listener container factory and use a stateful RetryTemplate in your listener, using one of the execute methods that taks a RetryState argument.
Feel free to add a GitHub issue for the framework to support stateful retry; contributions are welcome! - pull request issued.
EDIT
I just wrote a test case to demonstrate using stateful recovery with a #KafkaListener...
/*
* Copyright 2018 the original author or authors.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.springframework.kafka.annotation;
import static org.assertj.core.api.Assertions.assertThat;
import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ConcurrentMap;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.TimeUnit;
import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.junit.ClassRule;
import org.junit.Test;
import org.junit.runner.RunWith;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.kafka.config.ConcurrentKafkaListenerContainerFactory;
import org.springframework.kafka.config.KafkaListenerContainerFactory;
import org.springframework.kafka.core.DefaultKafkaConsumerFactory;
import org.springframework.kafka.core.DefaultKafkaProducerFactory;
import org.springframework.kafka.core.KafkaTemplate;
import org.springframework.kafka.core.ProducerFactory;
import org.springframework.kafka.listener.SeekToCurrentErrorHandler;
import org.springframework.kafka.support.KafkaHeaders;
import org.springframework.kafka.test.rule.KafkaEmbedded;
import org.springframework.kafka.test.utils.KafkaTestUtils;
import org.springframework.messaging.handler.annotation.Header;
import org.springframework.retry.RetryState;
import org.springframework.retry.backoff.ExponentialBackOffPolicy;
import org.springframework.retry.support.DefaultRetryState;
import org.springframework.retry.support.RetryTemplate;
import org.springframework.test.annotation.DirtiesContext;
import org.springframework.test.context.junit4.SpringRunner;
/**
* #author Gary Russell
* #since 5.0
*
*/
#RunWith(SpringRunner.class)
#DirtiesContext
public class StatefulRetryTests {
private static final String DEFAULT_TEST_GROUP_ID = "statefulRetry";
#ClassRule
public static KafkaEmbedded embeddedKafka = new KafkaEmbedded(1, true, 1, "sr1");
#Autowired
private Config config;
#Autowired
private KafkaTemplate<Integer, String> template;
#Test
public void testStatefulRetry() throws Exception {
this.template.send("sr1", "foo");
assertThat(this.config.listener1().latch1.await(10, TimeUnit.SECONDS)).isTrue();
assertThat(this.config.listener1().latch2.await(10, TimeUnit.SECONDS)).isTrue();
assertThat(this.config.listener1().result).isTrue();
}
#Configuration
#EnableKafka
public static class Config {
#Bean
public KafkaListenerContainerFactory<?> kafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory<Integer, String> factory =
new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory());
factory.getContainerProperties().setErrorHandler(new SeekToCurrentErrorHandler());
return factory;
}
#Bean
public DefaultKafkaConsumerFactory<Integer, String> consumerFactory() {
return new DefaultKafkaConsumerFactory<>(consumerConfigs());
}
#Bean
public Map<String, Object> consumerConfigs() {
Map<String, Object> consumerProps =
KafkaTestUtils.consumerProps(DEFAULT_TEST_GROUP_ID, "false", embeddedKafka);
consumerProps.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
return consumerProps;
}
#Bean
public KafkaTemplate<Integer, String> template() {
KafkaTemplate<Integer, String> kafkaTemplate = new KafkaTemplate<>(producerFactory());
return kafkaTemplate;
}
#Bean
public ProducerFactory<Integer, String> producerFactory() {
return new DefaultKafkaProducerFactory<>(producerConfigs());
}
#Bean
public Map<String, Object> producerConfigs() {
return KafkaTestUtils.producerProps(embeddedKafka);
}
#Bean
public Listener listener1() {
return new Listener();
}
}
public static class Listener {
private static final RetryTemplate retryTemplate = new RetryTemplate();
private static final ConcurrentMap<String, RetryState> states = new ConcurrentHashMap<>();
static {
ExponentialBackOffPolicy backOff = new ExponentialBackOffPolicy();
retryTemplate.setBackOffPolicy(backOff);
}
private final CountDownLatch latch1 = new CountDownLatch(3);
private final CountDownLatch latch2 = new CountDownLatch(1);
private volatile boolean result;
#KafkaListener(topics = "sr1", groupId = "sr1")
public void listen1(final String in, #Header(KafkaHeaders.RECEIVED_TOPIC) String topic,
#Header(KafkaHeaders.RECEIVED_PARTITION_ID) int partition,
#Header(KafkaHeaders.OFFSET) long offset) {
String recordKey = topic + partition + offset;
RetryState retryState = states.get(recordKey);
if (retryState == null) {
retryState = new DefaultRetryState(recordKey);
states.put(recordKey, retryState);
}
this.result = retryTemplate.execute(c -> {
// do your work here
this.latch1.countDown();
throw new RuntimeException("retry");
}, c -> {
latch2.countDown();
return true;
}, retryState);
states.remove(recordKey);
}
}
}
and
Seek to current after exception; nested exception is org.springframework.kafka.listener.ListenerExecutionFailedException: Listener method 'public void org.springframework.kafka.annotation.StatefulRetryTests$Listener.listen1(java.lang.String,java.lang.String,int,long)' threw exception; nested exception is java.lang.RuntimeException: retry
after each delivery attempt.
In this case, I added a recoverer to handle the message after retries are exhausted. You could do something else, like stop the container (but do that on a separate thread, like we do in the ContainerStoppingErrorHandler).

Storm Kafka Topolgy terminates without output

This is the StBolt.java class.
package com.storm.cassandra;
import java.util.Map;
import net.sf.json.JSONObject;
import net.sf.json.JSONSerializer;
import org.apache.log4j.Logger;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.BasicOutputCollector;
import org.apache.storm.topology.IBasicBolt;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Tuple;
import com.datastax.driver.core.Cluster;
import com.datastax.driver.core.Row;
import com.datastax.driver.core.Session;
public class StBolt implements IBasicBolt {
private static final long serialVersionUID = 1L;
private static final Logger logger = Logger
.getLogger(StBolt.class);
private static Session session = null;
private Cluster cluster = null;
String cassandraURL;
JSONObject eventJson = null;
String topicname = null;
String ip = null;
String menu = null;
String product = null;
Row row = null;
com.datastax.driver.core.ResultSet viewcount = null;
com.datastax.driver.core.ResultSet segmentlistResult = null;
com.datastax.driver.core.ResultSet newCountUpdatedResult = null;
public StBolt(String topicname) {
this.topicname = topicname;
}
public void prepare(Map stormConf, TopologyContext topologyContext) {
cluster = Cluster.builder().addContactPoint("127.0.0.1").build();
System.out.println("load cassandra ip");
session = cluster.connect();
System.out.println("CassandraCounterBolt prepare method ended");
}
public void execute(Tuple input, BasicOutputCollector collector) {
System.out.println("Execute");
Fields fields = input.getFields();
try {
eventJson = (JSONObject) JSONSerializer.toJSON((String) input
.getValueByField(fields.get(0)));
topicname = (String) eventJson.get("topicName");
ip = (String) eventJson.get("ip");
menu = (String) eventJson.get("menu");
product = (String) eventJson.get("product");
String ievent = "ievent";
String install = "install";
viewcount = session
.execute("update webapp.viewcount set count=count+1 where topicname='"+topicname+
"'and ip= '"+ip+"'and menu='"+menu+"'and product='"+product+"'" );
} catch (Exception e) {
e.printStackTrace();
}
}
public void declareOutputFields(OutputFieldsDeclarer declarer) {
}
public Map<String, Object> getComponentConfiguration() {
return null;
}
public void cleanup() {
}
}
Here is the StTopology.java class
package com.storm.cassandra;
import org.apache.storm.kafka.BrokerHosts;
import org.apache.storm.kafka.KafkaSpout;
import org.apache.storm.kafka.SpoutConfig;
import org.apache.storm.kafka.StringScheme;
import org.apache.storm.kafka.ZkHosts;
import org.apache.storm.Config;
import org.apache.storm.LocalCluster;
import org.apache.storm.StormSubmitter;
import org.apache.storm.spout.SchemeAsMultiScheme;
import org.apache.storm.topology.TopologyBuilder;
public class StTopology {
public static void main(String[] args) throws Exception {
if (args.length == 4) {
BrokerHosts hosts = new ZkHosts("localhost:2181");
//System.out
//.println("Insufficent Arguements - topologyName kafkaTopic ZKRoot ID");
SpoutConfig kafkaConf1 = new SpoutConfig(hosts, args[1], args[2],
args[3]);
//System.out
//.println("Insufficent Arguements - topologyName kafkaTopic ZKRoot ID");
//kafkaConf1.forceFromStart = false;
kafkaConf1.zkRoot = args[2];
kafkaConf1.scheme = new SchemeAsMultiScheme(new StringScheme());
KafkaSpout kafkaSpout1 = new KafkaSpout(kafkaConf1);
StBolt countbolt = new StBolt(args[1]);
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("kafkaspout", kafkaSpout1, 1);
builder.setBolt("counterbolt", countbolt, 1).shuffleGrouping(
"kafkaspout");
Config config = new Config();
config.setDebug(true);
config.put(Config.TOPOLOGY_TRIDENT_BATCH_EMIT_INTERVAL_MILLIS, 1);
config.setNumWorkers(1);
LocalCluster cluster = new LocalCluster();
cluster.submitTopology(args[0], config, builder.createTopology());
// StormSubmitter.submitTopology(args[0], config,
// builder.createTopology());
} else {
System.out
.println("Insufficent Arguements - topologyName kafkaTopic ZKRoot ID");
}
}
}
I am trying to get JSON data from the Kafka console producer, process it in Storm and store it into Cassandra.
For some reason, there is no response from the bolt when I run the code with parameters viewcount usercount /kafkastorm webapp1.
I have Kafka getting data from the console producer as topic usercount, and the correct table in Cassandra.
The code compiles and runs without any error but the console shows terminated.
I have no activity anywhere, despite providing the right JSON input to the Kafka console producer multiple times {"topicname":"usercount","ip":"127.0.0.1","menu":"dress","product":"tshirt"}.
There is no topology shown as being created in the Storm UI's Topology Summary either.
I believe I have all the Kafka, Storm and Cassandra dependencies in place.
Please point me in the right direction with this issue. Thanks.

Apache Kafka Default Encoder Not Working

I am using Kafka 0.8 beta, and I am just trying to mess around with sending different objects, serializing them using my own encoder, and sending them to an existing broker configuration. For now I am trying to get just DefaultEncoder working.
I have the broker and everything setup and working for StringEncoder, but I am not able to get any other data type, including just pure byte[], to be sent and received by the broker.
My code for the Producer is:
import kafka.javaapi.producer.Producer;
import kafka.producer.KeyedMessage;
import kafka.producer.ProducerConfig;
import java.util.Date;
import java.util.Properties;
import java.util.Random;
public class ProducerTest {
public static void main(String[] args) {
long events = 5;
Random rnd = new Random();
rnd.setSeed(new Date().getTime());
Properties props = new Properties();
props.setProperty("metadata.broker.list", "localhost:9093,localhost:9094");
props.setProperty("serializer.class", "kafka.serializer.DefaultEncoder");
props.setProperty("partitioner.class", "example.producer.SimplePartitioner");
props.setProperty("request.required.acks", "1");
props.setProperty("producer.type", "async");
props.setProperty("batch.num.messages", "4");
ProducerConfig config = new ProducerConfig(props);
Producer<byte[], byte[]> producer = new Producer<byte[], byte[]>(config);
for (long nEvents = 0; nEvents < events; nEvents++) {
byte[] a = "Hello".getBytes();
byte[] b = "There".getBytes();
KeyedMessage<byte[], byte[]> data = new KeyedMessage<byte[], byte[]>("page_visits", a, b);
producer.send(data);
}
try {
Thread.sleep(5000);
} catch (InterruptedException e) {
e.printStackTrace();
}
producer.close();
}
}
I used the same SimplePartitioner as in the example given here, and replacing all the byte arrays by Strings and changing the serializer to kafka.serializer.StringEncoder works perfectly.
For reference, SimplePartitioner:
import kafka.producer.Partitioner;
import kafka.utils.VerifiableProperties;
public class SimplePartitioner implements Partitioner<String> {
public SimplePartitioner (VerifiableProperties props) {
}
public int partition(String key, int a_numPartitions) {
int partition = 0;
int offset = key.lastIndexOf('.');
if (offset > 0) {
partition = Integer.parseInt( key.substring(offset+1)) % a_numPartitions;
}
return partition;
}
}
What am I doing wrong?
The answer is that the partitioning class SimplePartitioner is applicable only for Strings. When I try to run the Producer asynchronously, it creates a separate thread that handles the encoding and partitioning before sending to the broker. This thread hits a roadblock when it realizes that SimplePartitioner works only for Strings, but because it's a separate thread, no Exceptions are thrown, and so the thread just exits without any indication of wrongdoing.
If we change the SimplePartitioner to accept byte[], for instance:
import kafka.producer.Partitioner;
import kafka.utils.VerifiableProperties;
public class SimplePartitioner implements Partitioner<byte[]> {
public SimplePartitioner (VerifiableProperties props) {
}
public int partition(byte[] key, int a_numPartitions) {
int partition = 0;
return partition;
}
}
This works perfectly now.