Creating CEP with Apache Flink - apache-kafka

I'm trying to implement a very simple Apache Flink CEP for a Kafka InputStream.
The Kafka Producer generates a simple Double Value and send them via a Kafka Topic as String towards the Consumers. At the moment i'm coding a CEP Consumer with Flink.
So far this is my written code:
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.getConfig().disableSysoutLogging();
env.getConfig().setRestartStrategy(RestartStrategies.fixedDelayRestart(4, 10000));
env.setParallelism(3);
Properties properties = new Properties();
properties.setProperty("bootstrap.servers", "localhost:9092");
properties.setProperty("group.id", "flink_consumer");
DataStream<String> stream = env
.addSource(new FlinkKafkaConsumer09<>("temp", new SimpleStringSchema(), properties));
Pattern<String, ?> warning= Pattern.<String>begin("first")
.where(new IterativeCondition<String>() {
private static final long serialVersionUID = 1L;
#Override
public boolean filter(String value, Context<String> ctx) throws Exception {
return Double.parseDouble(value) >= 89.0;
}
})
.next("second")
.where(new IterativeCondition<String>() {
private static final long serialVersionUID = 1L;
#Override
public boolean filter(String value, Context<String> ctx) throws Exception {
return Double.parseDouble(value) >= 89.0;
}
})
.within(Time.seconds(10));
DataStream<String> temp = CEP.pattern(stream, warning).select(new PatternSelectFunction<String, String>() {
private static final long serialVersionUID = 1L;
#Override
public String select(Map<String, List<String>> pattern) throws Exception {
List warnung1 = pattern.get("first");
String first = (String) warnung1.get(1);
return first;
}
});
temp.print();
env.execute();
}
if I'm trying to execute this Code this is the error message:
Exception in thread "main" java.lang.NoSuchFieldError: NO_INDEX at
org.apache.flink.cep.PatternStream.select(PatternStream.java:102) at
CEPTest.main(CEPTest.java:50)
So it looks like my generated DataStream with the CEP Pattern is wrong, but i don't know whats wrong with that method. Every help would be great!
Edit: I tried some other example and at every execution I'm getting the same error. So I think something with my packages is wrong?

With Flink 1.6.0 my code works perfectly.

Related

Kafka: Consumer api: Regression test fails if runs in a group (sequentially)

I have implemented a kafka application using consumer api. And I have 2 regression tests implemented with stream api:
To test happy path: by producing data from the test ( into the input topic that the application is listening to) that will be consumed by the application and application will produce data (into the output topic ) that the test will consume and validate against expected output data.
To test error path: behavior is the same as above. Although this time application will produce data into output topic and test will consume from application's error topic and will validate against expected error output.
My code and the regression-test codes are residing under the same project under expected directory structure. Both time ( for both tests) data should have been picked up by the same listener at the application side.
The problem is :
When I am executing the tests individually (manually), each test is passing. However, If I execute them together but sequentially ( for example: gradle clean build ) , only first test is passing. 2nd test is failing after the test-side-consumer polling for data and after some time it gives up not finding any data.
Observation:
From debugging, it looks like, the 1st time everything works perfectly ( test-side and application-side producers and consumers). However, during the 2nd test it seems that application-side-consumer is not receiving any data ( It seems that test-side-producer is producing data, but can not say that for sure) and hence no data is being produced into the error topic.
What I have tried so far:
After investigations, my understanding is that we are getting into race conditions and to avoid that found suggestions like :
use #DirtiesContext(classMode = DirtiesContext.ClassMode.AFTER_EACH_TEST_METHOD)
Tear off broker after each test ( Please see the ".destry()" on brokers)
use different topic names for each test
I applied all of them and still could not recover from my issue.
I am providing the code here for perusal. Any insight is appreciated.
Code for 1st test (Testing error path):
#DirtiesContext(classMode = DirtiesContext.ClassMode.AFTER_EACH_TEST_METHOD)
#EmbeddedKafka(
partitions = 1,
controlledShutdown = false,
topics = {
AdapterStreamProperties.Constants.INPUT_TOPIC,
AdapterStreamProperties.Constants.ERROR_TOPIC
},
brokerProperties = {
"listeners=PLAINTEXT://localhost:9092",
"port=9092",
"log.dir=/tmp/data/logs",
"auto.create.topics.enable=true",
"delete.topic.enable=true"
}
)
public class AbstractIntegrationFailurePathTest {
private final int retryLimit = 0;
#Autowired
protected EmbeddedKafkaBroker embeddedFailurePathKafkaBroker;
//To produce data
#Autowired
protected KafkaTemplate<PreferredMediaMsgKey, SendEmailCmd> inputProducerTemplate;
//To read from output error
#Autowired
protected Consumer<PreferredMediaMsgKey, ErrorCmd> outputErrorConsumer;
//Service to execute notification-preference
#Autowired
protected AdapterStreamProperties projectProerties;
protected void subscribe(Consumer consumer, String topic, int attempt) {
try {
embeddedFailurePathKafkaBroker.consumeFromAnEmbeddedTopic(consumer, topic);
} catch (ComparisonFailure ex) {
if (attempt < retryLimit) {
subscribe(consumer, topic, attempt + 1);
}
}
}
}
.
#TestConfiguration
public class AdapterStreamFailurePathTestConfig {
#Autowired
private EmbeddedKafkaBroker embeddedKafkaBroker;
#Value("${spring.kafka.adapter.application-id}")
private String applicationId;
#Value("${spring.kafka.adapter.group-id}")
private String groupId;
//Producer of records that the program consumes
#Bean
public Map<String, Object> sendEmailCmdProducerConfigs() {
Map<String, Object> results = KafkaTestUtils.producerProps(embeddedKafkaBroker);
results.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,
AdapterStreamProperties.Constants.KEY_SERDE.serializer().getClass());
results.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,
AdapterStreamProperties.Constants.INPUT_VALUE_SERDE.serializer().getClass());
return results;
}
#Bean
public ProducerFactory<PreferredMediaMsgKey, SendEmailCmd> inputProducerFactory() {
return new DefaultKafkaProducerFactory<>(sendEmailCmdProducerConfigs());
}
#Bean
public KafkaTemplate<PreferredMediaMsgKey, SendEmailCmd> inputProducerTemplate() {
return new KafkaTemplate<>(inputProducerFactory());
}
//Consumer of the error output, generated by the program
#Bean
public Map<String, Object> outputErrorConsumerConfig() {
Map<String, Object> props = KafkaTestUtils.consumerProps(
applicationId, Boolean.TRUE.toString(), embeddedKafkaBroker);
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,
AdapterStreamProperties.Constants.KEY_SERDE.deserializer().getClass()
.getName());
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,
AdapterStreamProperties.Constants.ERROR_VALUE_SERDE.deserializer().getClass()
.getName());
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
return props;
}
#Bean
public Consumer<PreferredMediaMsgKey, ErrorCmd> outputErrorConsumer() {
DefaultKafkaConsumerFactory<PreferredMediaMsgKey, ErrorCmd> rpf =
new DefaultKafkaConsumerFactory<>(outputErrorConsumerConfig());
return rpf.createConsumer(groupId, "notification-failure");
}
}
.
#RunWith(SpringRunner.class)
#SpringBootTest(classes = AdapterStreamFailurePathTestConfig.class)
#ActiveProfiles(profiles = "errtest")
public class ErrorPath400Test extends AbstractIntegrationFailurePathTest {
#Autowired
private DataGenaratorForErrorPath400Test datagen;
#Mock
private AdapterHttpClient httpClient;
#Autowired
private ErroredEmailCmdDeserializer erroredEmailCmdDeserializer;
#Before
public void setup() throws InterruptedException {
Mockito.when(httpClient.callApi(Mockito.any()))
.thenReturn(
new GenericResponse(
400,
TestConstants.ERROR_MSG_TO_CHK));
Mockito.when(httpClient.createURI(Mockito.any(),Mockito.any(),Mockito.any())).thenCallRealMethod();
inputProducerTemplate.send(
projectProerties.getInputTopic(),
datagen.getKey(),
datagen.getEmailCmdToProduce());
System.out.println("producer: "+ projectProerties.getInputTopic());
subscribe(outputErrorConsumer , projectProerties.getErrorTopic(), 0);
}
#Test
public void testWithError() throws InterruptedException, InvalidProtocolBufferException, TextFormat.ParseException {
ConsumerRecords<PreferredMediaMsgKeyBuf.PreferredMediaMsgKey, ErrorCommandBuf.ErrorCmd> records;
List<ConsumerRecord<PreferredMediaMsgKeyBuf.PreferredMediaMsgKey, ErrorCommandBuf.ErrorCmd>> outputListOfErrors = new ArrayList<>();
int attempt = 0;
int expectedRecords = 1;
do {
records = KafkaTestUtils.getRecords(outputErrorConsumer);
records.forEach(outputListOfErrors::add);
attempt++;
} while (attempt < expectedRecords && outputListOfErrors.size() < expectedRecords);
//Verify the recipient event stream size
Assert.assertEquals(expectedRecords, outputListOfErrors.size());
//Validate output
}
#After
public void tearDown() {
outputErrorConsumer.close();
embeddedFailurePathKafkaBroker.destroy();
}
}
2nd test is almost the same in structure. Although this time the test-side-consumer is consuming from application-side-output-topic( instead of error topic). And I named the consumers,broker,producer,topics differently. Like :
#DirtiesContext(classMode = DirtiesContext.ClassMode.AFTER_EACH_TEST_METHOD)
#EmbeddedKafka(
partitions = 1,
controlledShutdown = false,
topics = {
AdapterStreamProperties.Constants.INPUT_TOPIC,
AdapterStreamProperties.Constants.OUTPUT_TOPIC
},
brokerProperties = {
"listeners=PLAINTEXT://localhost:9092",
"port=9092",
"log.dir=/tmp/data/logs",
"auto.create.topics.enable=true",
"delete.topic.enable=true"
}
)
public class AbstractIntegrationSuccessPathTest {
private final int retryLimit = 0;
#Autowired
protected EmbeddedKafkaBroker embeddedKafkaBroker;
//To produce data
#Autowired
protected KafkaTemplate<PreferredMediaMsgKey,SendEmailCmd> sendEmailCmdProducerTemplate;
//To read from output regular topic
#Autowired
protected Consumer<PreferredMediaMsgKey, NotifiedEmailCmd> ouputConsumer;
//Service to execute notification-preference
#Autowired
protected AdapterStreamProperties projectProerties;
protected void subscribe(Consumer consumer, String topic, int attempt) {
try {
embeddedKafkaBroker.consumeFromAnEmbeddedTopic(consumer, topic);
} catch (ComparisonFailure ex) {
if (attempt < retryLimit) {
subscribe(consumer, topic, attempt + 1);
}
}
}
}
Please let me know if I should provide any more information.,
"port=9092"
Don't use a fixed port; leave that out and the embedded broker will use a random port; the consumer configs are set up in KafkaTestUtils to point to the random port.
You shouldn't need to dirty the context after each test method - use a different group.id for each test and a different topic.
In my case the consumer was not closed properly. I had to do :
#After
public void tearDown() {
// shutdown hook to correctly close the streams application
Runtime.getRuntime().addShutdownHook(new Thread(ouputConsumer::close));
}
to resolve.

Why is windowing now working for Kafka Streams?

I am running a simple Kafka Streams program on my eclipse which is running successfully, but it is not able to implement the windowing concept.
I want to process all the messages received in a window of 5 seconds to the output topic. I googled and understand that I need to implement the tumbling window concept. However, I see that the output is sent to the output topic instantly.
What am I doing wrong here? Below is the main method that I am running:
public static void main(String[] args) throws Exception {
Properties props = new Properties();
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "streams-wordcount");
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(StreamsConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG, 0);
props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
final StreamsBuilder builder = new StreamsBuilder();
KStream<String, String> source = builder.stream("wc-input");
#SuppressWarnings("deprecation")
KTable<Windowed<String>, Long> counts = source
.flatMapValues(new ValueMapper<String, Iterable<String>>() {
#Override
public Iterable<String> apply(String value) {
return Arrays.asList(value.toLowerCase(Locale.getDefault()).split(" "));
}
})
.groupBy(new KeyValueMapper<String, String, String>() {
#Override
public String apply(String key, String value) {
return value;
}
})
.count(TimeWindows.of(10000L)
.until(10000L),"Counts");
// need to override value serde to Long type
counts.to("wc-output");
final Topology topology = builder.build();
final KafkaStreams streams = new KafkaStreams(topology, props);
final CountDownLatch latch = new CountDownLatch(1);
// attach shutdown handler to catch control-c
Runtime.getRuntime().addShutdownHook(new Thread("streams-wordcount-shutdown-hook") {
#Override
public void run() {
streams.close();
latch.countDown();
}
});
try {
streams.start();
long windowSizeMs = TimeUnit.MINUTES.toMillis(50000); // 5 * 60 * 1000L
TimeWindows.of(windowSizeMs);
TimeWindows.of(windowSizeMs).advanceBy(windowSizeMs);
latch.await();
} catch (Throwable e) {
System.exit(1);
}
System.exit(0);
}
Windowing does not mean "one output" per window. If you want to get only one output per window, you want so use suppress() on the result KTable.
Compare this article: https://www.confluent.io/blog/watermarks-tables-event-time-dataflow-model/

What happens to the timestamp of a message in a stream when it's mapped into another stream?

I've an application where I process a stream and convert it into another. Here is a sample:
public void run(final String... args) {
final Serde<Event> eventSerde = new EventSerde();
final Properties props = streamingConfig.getProperties(
applicationName,
concurrency,
Serdes.String(),
eventSerde
);
props.put(StreamsConfig.PROCESSING_GUARANTEE_CONFIG, EXACTLY_ONCE);
props.put(StreamsConfig.DEFAULT_TIMESTAMP_EXTRACTOR_CLASS_CONFIG, EventTimestampExtractor.class);
final StreamsBuilder builder = new StreamsBuilder();
KStream<String, Event> eventStream = builder.stream(inputStream);
final Serde<Device> deviceSerde = new DeviceSerde();
eventStream
.map((key, event) -> {
final Device device = modelMapper.map(event, Device.class);
return new KeyValue<>(key, device);
})
.to("device_topic", Produced.with(Serdes.String(), deviceSerde));
final Topology topology = builder.build();
final KafkaStreams streams = new KafkaStreams(topology, props);
streams.start();
}
Here are some details about the app:
Spring Boot 1.5.17
Kafka 2.1.0
Kafka Streams 2.1.0
Spring Kafka 1.3.6
Although a timestamp is set in the messages inside the input stream, I also place an implementation of TimestampExtractor to make sure that a proper timestamp is attached into all messages (as other producers may send messages into the same topic).
Within the code, I receive a stream of events and I basically convert them into different objects and eventually route those objects into different streams.
I'm trying to understand whether the initial timestamp I set is still attached to the messages published into device_topic in this particular case.
The receiving end (of device stream) is like this:
#KafkaListener(topics = "device_topic")
public void onDeviceReceive(final Device device, #Header(KafkaHeaders.RECEIVED_TIMESTAMP) final long timestamp) {
log.trace("[{}] Received device: {}", timestamp, device);
}
Unfortunetely the printed timestamp seems to be wall clock time. Is this the expected behaviour or am I missing something?
Spring Kafka 1.3.x uses a very old 0.11 client; perhaps it doesn't propagate the timestamp. I just tested with Boot 2.1.3 and Spring Kafka 2.2.4 and the timestamp is propagated ok...
#SpringBootApplication
#EnableKafkaStreams
public class So54771130Application {
public static void main(String[] args) {
SpringApplication.run(So54771130Application.class, args);
}
#Bean
public ApplicationRunner runner(KafkaTemplate<String, String> template) {
return args -> {
template.send("so54771130", 0, 42L, null, "baz");
};
}
#Bean
public KStream<String, String> stream(StreamsBuilder builder) {
KStream<String, String> stream = builder.stream("so54771130");
stream
.map((k, v) -> {
System.out.println("Mapping:" + v);
return new KeyValue<>(null, "bar");
})
.to("so54771130-1");
return stream;
}
#Bean
public NewTopic topic1() {
return new NewTopic("so54771130", 1, (short) 1);
}
#Bean
public NewTopic topic2() {
return new NewTopic("so54771130-1", 1, (short) 1);
}
#KafkaListener(id = "so54771130", topics = "so54771130-1")
public void listen(String in, #Header(KafkaHeaders.RECEIVED_TIMESTAMP) long ts) {
System.out.println(in + "#" + ts);
}
}
and
Mapping:baz
bar#42

sending DataStream from VM socket to Kafka and receiving on Host OS's Flink program : Deserialization Issue

I am sending data stream from VM to Kafka's test topic (running on host OS at 192.168.0.12 IP ) using code below
public class WriteToKafka {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
// Use ingestion time => TimeCharacteristic == EventTime + IngestionTimeExtractor
env.setStreamTimeCharacteristic(TimeCharacteristic.IngestionTime);
DataStream<JoinedStreamEvent> joinedStreamEventDataStream = env
.addSource(new JoinedStreamGenerator()).assignTimestampsAndWatermarks(new IngestionTimeExtractor<>());
Properties properties = new Properties();
properties.setProperty("bootstrap.servers", "192.168.0.12:9092");
properties.setProperty("zookeeper.connect", "192.168.0.12:2181");
properties.setProperty("group.id", "test");
DataStreamSource<JoinedStreamEvent> stream = env.addSource(new JoinedStreamGenerator());
stream.addSink(new FlinkKafkaProducer09<JoinedStreamEvent>("test", new TypeInformationSerializationSchema<>(stream.getType(),env.getConfig()), properties));
env.execute();
}
JoinedStreamEvent is of type DataSream<Tuple3<Integer,Integer,Integer>> it basically joins 2 streams respirationRateStream and heartRateStream
public JoinedStreamEvent(Integer patient_id, Integer heartRate, Integer respirationRate) {
Patient_id = patient_id;
HeartRate = heartRate;
RespirationRate = respirationRate;
There is another Flink program that is running on Host OS trying to read the data Stream from kafka . I am using localhost here as kafka and zookeper are running on Host OS.
public class ReadFromKafka {
public static void main(String[] args) throws Exception {
// create execution environment
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
Properties properties = new Properties();
properties.setProperty("bootstrap.servers", "localhost:9092");
properties.setProperty("zookeeper.connect", "localhost:2181");
properties.setProperty("group.id", "test");
DataStream<String> message = env.addSource(new FlinkKafkaConsumer09<String>("test", new SimpleStringSchema(), properties));
/* DataStream<JoinedStreamEvent> message = env.addSource(new FlinkKafkaConsumer09<JoinedStreamEvent>("test",
new , properties));*/
message.print();
env.execute();
} //main
} //ReadFromKafka
I am getting output something like this
I think I need to implement deserializer of type JoinedStreamEvent. Can someone please give me an idea how should I write, the deserializer for JoinedStreamEvent of type DataSream<Tuple3<Integer, Integer, Integer>>
Please let me know if something else needs to be done.
P.S. - I thought of writing following deserializer, but I don't think it is right
DataStream<JoinedStreamEvent> message = env.addSource(new FlinkKafkaConsumer09<JoinedStreamEvent>("test",
new TypeInformationSerializationSchema<JoinedStreamEvent>() , properties));
I was able to receive events VM in the same format by writing a custom serializer and deserializer for both VM and host OS program as mentioned below
public class JoinSchema implements DeserializationSchema<JoinedStreamEvent> , SerializationSchema<JoinedStreamEvent> {
#Override
public JoinedStreamEvent deserialize(byte[] bytes) throws IOException {
return JoinedStreamEvent.fromstring(new String(bytes));
}
#Override
public boolean isEndOfStream(JoinedStreamEvent joinedStreamEvent) {
return false;
}
#Override
public TypeInformation<JoinedStreamEvent> getProducedType() {
return TypeExtractor.getForClass(JoinedStreamEvent.class);
}
#Override
public byte[] serialize(JoinedStreamEvent joinedStreamEvent) {
return joinedStreamEvent.toString().getBytes();
}
} //JoinSchema
Please note that you may have to write fromstring( ) method in your event-type method, as I have added below fromString( ) JoinedStreamEvent Class
public static JoinedStreamEvent fromstring(String line){
String[] token = line.split(",");
JoinedStreamEvent joinedStreamEvent = new JoinedStreamEvent();
Integer val1 = Integer.valueOf(token[0]);
Integer val2 = Integer.valueOf(token[1]);
Integer val3 = Integer.valueOf(token[2]);
return new JoinedStreamEvent(val1,val2,val3);
} //fromstring
Events were sent from VM using below code
stream.addSink(new FlinkKafkaProducer09<JoinedStreamEvent>("test", new JoinSchema(), properties));
Events were received using following code
public static void main(String[] args) throws Exception {
// create execution environment
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
Properties properties = new Properties();
properties.setProperty("bootstrap.servers", "localhost:9092");
properties.setProperty("zookeeper.connect", "localhost:2181");
properties.setProperty("group.id", "test");
DataStream<JoinedStreamEvent> message = env.addSource(new FlinkKafkaConsumer09<JoinedStreamEvent>("test",
new JoinSchema(), properties));
message.print();
env.execute();
} //main

Why use Kryo serialize framework into apache storm will over write data when blot get values

Maybe mostly develop were use AVRO as serialize framework in Kafka and Apache Storm scheme. But I need handle most complex data then I found the Kryo serialize framework also were successfully integrate it into our project which follow Kafka and Apache Storm environment. But when want to further operation there had a strange status.
I had sent 5 times message to Kafka, the Storm job also can read the 5 messages and deserialize success. But next blot get the data value is wrong. There print out the same value as the last message. Then I had add the print out after when complete the deserialize code. Actually it print out true there had different 5 message. Why the next blot can't the values? See my code below:
KryoScheme.java
public abstract class KryoScheme<T> implements Scheme {
private static final long serialVersionUID = 6923985190833960706L;
private static final Logger logger = LoggerFactory.getLogger(KryoScheme.class);
private Class<T> clazz;
private Serializer<T> serializer;
public KryoScheme(Class<T> clazz, Serializer<T> serializer) {
this.clazz = clazz;
this.serializer = serializer;
}
#Override
public List<Object> deserialize(byte[] buffer) {
Kryo kryo = new Kryo();
kryo.register(clazz, serializer);
T scheme = null;
try {
scheme = kryo.readObject(new Input(new ByteArrayInputStream(buffer)), this.clazz);
logger.info("{}", scheme);
} catch (Exception e) {
String errMsg = String.format("Kryo Scheme failed to deserialize data from Kafka to %s. Raw: %s",
clazz.getName(),
new String(buffer));
logger.error(errMsg, e);
throw new FailedException(errMsg, e);
}
return new Values(scheme);
}}
PrintFunction.java
public class PrintFunction extends BaseFunction {
private static final Logger logger = LoggerFactory.getLogger(PrintFunction.class);
#Override
public void execute(TridentTuple tuple, TridentCollector collector) {
List<Object> data = tuple.getValues();
if (data != null) {
logger.info("Scheme data size: {}", data.size());
for (Object value : data) {
PrintOut out = (PrintOut) value;
logger.info("{}.{}--value: {}",
Thread.currentThread().getName(),
Thread.currentThread().getId(),
out.toString());
collector.emit(new Values(out));
}
}
}}
StormLocalTopology.java
public class StormLocalTopology {
public static void main(String[] args) {
........
BrokerHosts zk = new ZkHosts("xxxxxx");
Config stormConf = new Config();
stormConf.put(Config.TOPOLOGY_DEBUG, false);
stormConf.put(Config.TOPOLOGY_TRIDENT_BATCH_EMIT_INTERVAL_MILLIS, 1000 * 5);
stormConf.put(Config.TOPOLOGY_WORKERS, 1);
stormConf.put(Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS, 5);
stormConf.put(Config.TOPOLOGY_TASKS, 1);
TridentKafkaConfig actSpoutConf = new TridentKafkaConfig(zk, topic);
actSpoutConf.fetchSizeBytes = 5 * 1024 * 1024 ;
actSpoutConf.bufferSizeBytes = 5 * 1024 * 1024 ;
actSpoutConf.scheme = new SchemeAsMultiScheme(scheme);
actSpoutConf.startOffsetTime = kafka.api.OffsetRequest.LatestTime();
TridentTopology topology = new TridentTopology();
TransactionalTridentKafkaSpout actSpout = new TransactionalTridentKafkaSpout(actSpoutConf);
topology.newStream(topic, actSpout).parallelismHint(4).shuffle()
.each(new Fields("act"), new PrintFunction(), new Fields());
LocalCluster cluster = new LocalCluster();
cluster.submitTopology(topic+"Topology", stormConf, topology.build());
}}
There also other problem why the kryo scheme only can read one message buffer. Is there other way get multi messages buffer then can batch send data to next blot.
Also if I send 1 message the full flow seems success.
Then send 2 message is wrong. the print out message like below:
56157 [Thread-18-spout0] INFO s.s.a.s.s.c.KryoScheme - 2016-02- 05T17:20:48.122+0800,T6mdfEW#N5pEtNBW
56160 [Thread-20-b-0] INFO s.s.a.s.s.PrintFunction - Scheme data size: 1
56160 [Thread-18-spout0] INFO s.s.a.s.s.c.KryoScheme - 2016-02- 05T17:20:48.282+0800,T(o2KnFxtGB0Tlp8
56161 [Thread-20-b-0] INFO s.s.a.s.s.PrintFunction - Thread-20-b-0.99--value: 2016-02-05T17:20:48.282+0800,T(o2KnFxtGB0Tlp8
56162 [Thread-20-b-0] INFO s.s.a.s.s.PrintFunction - Scheme data size: 1
56162 [Thread-20-b-0] INFO s.s.a.s.s.PrintFunction - Thread-20-b-0.99--value: 2016-02-05T17:20:48.282+0800,T(o2KnFxtGB0Tlp8
I'm sorry this my mistake. Just found a bug in Kryo deserialize class, there exist an local scope parameter, so it can be over write in multi thread environment. Not change the parameter in party scope, the code run well.
reference code see blow:
public class KryoSerializer<T extends BasicEvent> extends Serializer<T> implements Serializable {
private static final long serialVersionUID = -4684340809824908270L;
// It's wrong set
//private T event;
public KryoSerializer(T event) {
this.event = event;
}
#Override
public void write(Kryo kryo, Output output, T event) {
event.write(output);
}
#Override
public T read(Kryo kryo, Input input, Class<T> type) {
T event = new T();
event.read(input);
return event;
}
}