Mixing Kafka Streams DSL with Processor API to get offset - apache-kafka

I am trying to find a way to log the offset when an exception occurs.
Here is what I am trying to achieve:
void createTopology(StreamsBuilder builder) {
builder.stream(topic, Consumed.with(Serdes.String(), new JsonSerde()))
.filter(...)
.mapValues(value -> {
Map<String, Object> output;
try {
output = decode(value.get("data"));
} catch (DecodingException e) {
LOGGER.error(e.getMessage());
// TODO: LOG OFFSET FOR FAILED DECODE HERE
return new ArrayList<>();
}
...
return output;
})
.filter((k, v) -> !(v instanceof List && ((List<?>) v).isEmpty()))
.to(sink_topic);
}
I found this: https://docs.confluent.io/platform/current/streams/developer-guide/dsl-api.html#streams-developer-guide-dsl-transformations-stateful
and it is in my understanding that I need to use the Processor API but still haven't found a solution for my issue.

A ValueTransfomer can also access the offset via the ProcessorContext passed via init, and I believe it's much easier.

Here is the solution, as suggested by IUSR: https://stackoverflow.com/a/73465691/14945779 (thank you):
static class InjectOffsetTransformer implements ValueTransformer<JsonObject, JsonObject> {
private ProcessorContext context;
#Override
public void init(ProcessorContext context) {
this.context = context;
}
#Override
public JsonObject transform(JsonObject value) {
value.addProperty("offset", context.offset());
return value;
}
#Override
public void close() {
}
}
void createTopology(StreamsBuilder builder) {
builder.stream(topic, Consumed.with(Serdes.String(), new JsonSerde()))
.filter(...)
.transformValues(InjectOffsetTransformer::new)
.mapValues(value -> {
Map<String, Object> output;
try {
output = decode(value.get("data"));
} catch (DecodingException e) {
LOGGER.warn(String.format("Error reading from topic %s. Last read offset %s:", topic, lastReadOffset), e);
return new ArrayList<>();
}
lastReadOffset = value.get("offset").getAsLong();
return output;
})
.filter((k, v) -> !(v instanceof List && ((List<?>) v).isEmpty()))
.to(sink_topic);
}

Related

How to read the Header values in the Batch listener error handling scenario

I am trying to handle the exception at the listener
#KafkaListener(id = PropertiesUtil.ID,
topics = "#{'${kafka.consumer.topic}'}",
groupId = "${kafka.consumer.group.id.config}",
containerFactory = "containerFactory",
errorHandler = "errorHandler")
public void receiveEvents(#Payload List<ConsumerRecord<String, String>> recordList,
Acknowledgment acknowledgment) {
try {
log.info("Consuming the batch of size {} from kafka topic {}", consumerRecordList.size(),
consumerRecordList.get(0).topic());
processEvent(consumerRecordList);
incrementOffset(acknowledgment);
} catch (Exception exception) {
throwOrHandleExceptions(exception, recordList, acknowledgment);
.........
}
}
The Kafka container config:
#Bean
public KafkaListenerContainerFactory<ConcurrentMessageListenerContainer<String, String>>
containerFactory() {
ConcurrentKafkaListenerContainerFactory<String, String> factory =
new ConcurrentKafkaListenerContainerFactory<>();
factory.setConcurrency(this.numberOfConsumers);
factory.getContainerProperties().setAckOnError(false);
factory.getContainerProperties().setAckMode(ContainerProperties.AckMode.MANUAL);
factory.setConsumerFactory(getConsumerFactory());
factory.setBatchListener(true);
return factory;
}
}
the listener error handler impl
#Bean
public ConsumerAwareListenerErrorHandler errorHandler() {
return (m, e, c) -> {
MessageHeaders headers = m.getHeaders();
List<String> topics = headers.get(KafkaHeaders.RECEIVED_TOPIC, List.class);
List<Integer> partitions = headers.get(KafkaHeaders.RECEIVED_PARTITION_ID, List.class);
List<Long> offsets = headers.get(KafkaHeaders.OFFSET, List.class);
Map<TopicPartition, Long> offsetsToReset = new HashMap<>();
for (int i = 0; i < topics.size(); i++) {
int index = i;
offsetsToReset.compute(new TopicPartition(topics.get(i), partitions.get(i)),
(k, v) -> v == null ? offsets.get(index) : Math.min(v, offsets.get(index)));
}
...
};
}
when i try to run the same without the batching processing then i am able to fetch the partition,topic and offset values but when i enable batch processing and try to test it then i am getting only two values inside the headers i.e id and timestamp and other values are not set. Am i missing anything here??
What version are you using? I just tested it with Boot 2.2.4 (SK 2.3.5) and it works fine...
#SpringBootApplication
public class So60152179Application {
public static void main(String[] args) {
SpringApplication.run(So60152179Application.class, args);
}
#KafkaListener(id = "so60152179", topics = "so60152179", errorHandler = "eh")
public void listen(List<String> in) {
throw new RuntimeException("foo");
}
#Bean
public ConsumerAwareListenerErrorHandler eh() {
return (m, e, c) -> {
System.out.println(m);
return null;
};
}
#Bean
public ApplicationRunner runner(KafkaTemplate<String, String> template) {
return args -> {
template.send("so60152179", "foo");
};
}
#Bean
public NewTopic topic() {
return TopicBuilder.name("so60152179").partitions(1).replicas(1).build();
}
}
spring.kafka.listener.type=batch
spring.kafka.consumer.auto-offset-reset=earliest
GenericMessage [payload=[foo], headers={kafka_offset=[0], kafka_nativeHeaders=[RecordHeaders(headers = [], isReadOnly = false)], kafka_consumer=org.apache.kafka.clients.consumer.KafkaConsumer#2f2e787f, kafka_timestampType=[CREATE_TIME], kafka_receivedMessageKey=[null], kafka_receivedPartitionId=[0], kafka_receivedTopic=[so60152179], kafka_receivedTimestamp=[1581351585253], kafka_groupId=so60152179}]

How can we club all the values for same key as a list and return Kafka Streams with key and value as String

I have a data coming on kafka topic as (key:id, {id:1, body:...})
means key for the message is same as id. however there can be multiple messages with the same id but different body.
so I am getting the kstream <String, String>
Now I want to get all the messages having same id (key) and club all the values as a list and return as
Kstream<String, List<String>>
Any sugessions?
//Create a Stream with a state store
StreamsBuilder builder = new StreamsBuilder();
StoreBuilder<KeyValueStore<String, List<String>>> logTracerStateStore = Stores.keyValueStoreBuilder(
Stores.persistentKeyValueStore(LOG_TRACE_STATE_STORE), Serdes.String(),
new ListSerde<String>(Serdes.String()));
//add this to stream builder
builder.addStateStore(logTracerStateStore);
KStream<String, String> kafkaStream = builder.stream(TOPIC);
splitProcessor(kafkaStream);
logger.info("creating stream for topic {} ..", TOPIC);
final Topology topology = builder.build();
return new KafkaStreams(topology, streamConfiguration(bootstrapServers));
// Stream List Serde
public class ListSerde<T> implements Serde<List<T>> {
private final Serde<List<T>> inner;
public ListSerde( final Serde<T> avroSerde) {
inner = Serdes.serdeFrom(new ListSerializer<>( avroSerde.serializer()),
new ListDeserializer<>( avroSerde.deserializer()));
}
#Override
public Serializer<List<T>> serializer() {
return inner.serializer();
}
#Override
public Deserializer<List<T>> deserializer() {
return inner.deserializer();
}
#Override
public void configure(final Map<String, ?> configs, final boolean isKey) {
inner.serializer().configure(configs, isKey);
inner.deserializer().configure(configs, isKey);
}
#Override
public void close() {
inner.serializer().close();
inner.deserializer().close();
}
}
// Serializer & deserializers
public class ListSerializer<T> implements Serializer<List<T>> {
// private final Comparator<T> comparator;
private final Serializer<T> valueSerializer;
public ListSerializer( final Serializer<T> valueSerializer) {
// this.comparator = comparator;
this.valueSerializer = valueSerializer;
}
#Override
public void configure(final Map<String, ?> configs, final boolean isKey) {
// do nothing
}
#Override
public byte[] serialize(final String topic, final List<T> list) {
final int size = list.size();
final ByteArrayOutputStream baos = new ByteArrayOutputStream();
final DataOutputStream out = new DataOutputStream(baos);
final Iterator<T> iterator = list.iterator();
try {
out.writeInt(size);
while (iterator.hasNext()) {
final byte[] bytes = valueSerializer.serialize(topic, iterator.next());
out.writeInt(bytes.length);
out.write(bytes);
}
out.close();
} catch (final IOException e) {
throw new RuntimeException("unable to serialize List", e);
}
return baos.toByteArray();
}
#Override
public void close() {
}
}
//------------
public class ListDeserializer<T> implements Deserializer<List<T>> {
// private final Comparator<T> comparator;
private final Deserializer<T> valueDeserializer;
public ListDeserializer(final Deserializer<T> valueDeserializer) {
// this.comparator = comparator;
this.valueDeserializer = valueDeserializer;
}
#Override
public void configure(final Map<String, ?> configs, final boolean isKey) {
// do nothing
}
#Override
public List<T> deserialize(final String s, final byte[] bytes) {
if (bytes == null || bytes.length == 0) {
return null;
}
final List<T> list = new ArrayList<>();
final DataInputStream dataInputStream = new DataInputStream(new ByteArrayInputStream(bytes));
try {
final int records = dataInputStream.readInt();
for (int i = 0; i < records; i++) {
final byte[] valueBytes = new byte[dataInputStream.readInt()];
dataInputStream.read(valueBytes);
list.add(valueDeserializer.deserialize(s, valueBytes));
}
// dataInputStream.close();
} catch (final IOException e) {
throw new RuntimeException("Unable to deserialize PriorityQueue", e);
}finally {
try {
dataInputStream.close();
} catch (Exception e2) {
// TODO: handle exception
}
}
return list;
}
#Override
public void close() {
}
}
/// Now create Stream Processors
public class LogTraceStreamStateProcessor implements Processor<String, String>{
private static final Logger logger = Logger.getLogger(LogTraceStreamStateProcessor.class);
IStore stateStore;
/**
* Initialize the transformer.
*/
#Override
public void init(ProcessorContext context) {
logger.info("initializing processor and looking for monitoring store");
stateStore = MonitoringStateStoreFactory.getInstance().getStore();
logger.debug("found the monitoring store - {} ", stateStore);
stateStore.initLogTraceStoreProcess(context);
logger.debug("initalizing monitoring store.");
}
#Override
public void process(String key, String value) {
logger.debug("Storing the value for logtrace storage - {} ", value);
stateStore.storeLogTrace(value);
logger.debug("finished Storing the value for logtrace storage - {} ", value);
}
#Override
public void close() {
// TODO Auto-generated method stub
}
}
// access the key value state store like below
KeyValueStore<String, List<String>> stateStore = (KeyValueStore<String, List<String>>) traceStreamContext.getStateStore(EXEID_REQ_REL_STORE);
//Now add a list to new key for a new message and if the key exists then add a new message in the list
public void storeTraceData(String traceData) {
try {
TraceEvent tracer = new TraceEvent();
logger.debug("Received the Trace value - {}", traceData);
tracer = mapper.readValue(traceData, TraceEvent.class);
logger.debug("trace unmarshelling has been completed successfully !!!");
String key = tracer.getExecutionId();
List<String> listEvents = stateStore.get(key);
if (listEvents != null && !listEvents.isEmpty()) {
logger.debug("event is already in store so storing in the list for execution id - {}", key);
listEvents.add(requestId);
stateStore.put(key, listEvents);
} else {
logger.debug(
"event is not present in the store so creating a new list and adding into store for execution id - {}",
key);
List<String> list = new ArrayList<>();
list.add(requestId);
stateStore.put(key, list);
}
} catch (Throwable e) {
logger.error("exception while processing the trace event .. ", e);
} finally {
try {
traceStreamContext.commit();
} catch (Exception e2) {
e2.printStackTrace();
}
}
}
/// now this is how you can access the message from state store
public ReadOnlyKeyValueStore<String, List<String>> tracerStore() {
return waitUntilStoreIsQueryable(KEY_NAME);
}

Apache beam IOException in decoder

I have a simple pipeline that reads from Kafka by KafkaIO reader and transforms next into pipeline. In the end, it writes down to GCP in avro format. So when I run the pipeline in DataFlow it works perfectly but when the runner is DirectRunner it reads all data from topics and throws the exception.
java.lang.IllegalArgumentException: Forbidden IOException when reading from InputStream
at org.apache.beam.sdk.util.CoderUtils.decodeFromSafeStream(CoderUtils.java:118)
at org.apache.beam.sdk.util.CoderUtils.decodeFromByteArray(CoderUtils.java:98)
at org.apache.beam.sdk.util.CoderUtils.decodeFromByteArray(CoderUtils.java:92)
at org.apache.beam.sdk.util.CoderUtils.clone(CoderUtils.java:141)
at org.apache.beam.runners.direct.CloningBundleFactory$CloningBundle.add(CloningBundleFactory.java:84)
at org.apache.beam.runners.direct.GroupAlsoByWindowEvaluatorFactory$OutputWindowedValueToBundle.outputWindowedValue(GroupAlsoByWindowEvaluatorFactory.java:251)
at org.apache.beam.runners.direct.GroupAlsoByWindowEvaluatorFactory$OutputWindowedValueToBundle.outputWindowedValue(GroupAlsoByWindowEvaluatorFactory.java:237)
at org.apache.beam.repackaged.direct_java.runners.core.ReduceFnRunner.lambda$onTrigger$1(ReduceFnRunner.java:1057)
at org.apache.beam.repackaged.direct_java.runners.core.ReduceFnContextFactory$OnTriggerContextImpl.output(ReduceFnContextFactory.java:438)
at org.apache.beam.repackaged.direct_java.runners.core.SystemReduceFn.onTrigger(SystemReduceFn.java:125)
at org.apache.beam.repackaged.direct_java.runners.core.ReduceFnRunner.onTrigger(ReduceFnRunner.java:1060)
at org.apache.beam.repackaged.direct_java.runners.core.ReduceFnRunner.onTimers(ReduceFnRunner.java:768)
at org.apache.beam.runners.direct.GroupAlsoByWindowEvaluatorFactory$GroupAlsoByWindowEvaluator.processElement(GroupAlsoByWindowEvaluatorFactory.java:185)
at org.apache.beam.runners.direct.DirectTransformExecutor.processElements(DirectTransformExecutor.java:160)
at org.apache.beam.runners.direct.DirectTransformExecutor.run(DirectTransformExecutor.java:124)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.EOFException
at org.apache.beam.sdk.util.VarInt.decodeLong(VarInt.java:73)
at org.apache.beam.sdk.coders.IterableLikeCoder.decode(IterableLikeCoder.java:136)
at org.apache.beam.sdk.coders.IterableLikeCoder.decode(IterableLikeCoder.java:60)
at org.apache.beam.sdk.coders.Coder.decode(Coder.java:159)
at org.apache.beam.sdk.coders.KvCoder.decode(KvCoder.java:82)
at org.apache.beam.sdk.coders.KvCoder.decode(KvCoder.java:36)
at org.apache.beam.sdk.util.CoderUtils.decodeFromSafeStream(CoderUtils.java:115)
... 19 more
I use custom serializator and deserializator for reading avro and getting paylod.
Kafka Reader
private PTransform<PBegin, PCollection<KV<String, AvroGenericRecord>>> createKafkaRead(Map<String, Object> configUpdates) {
return KafkaIO.<String, AvroGenericRecord>read()
.withBootstrapServers(bootstrapServers)
.withConsumerConfigUpdates(configUpdates)
.withTopics(inputTopics)
.withKeyDeserializer(StringDeserializer.class)
.withValueDeserializerAndCoder(BeamKafkaAvroGenericDeserializer.class, AvroGenericCoder.of(serDeConfig()))
.withMaxNumRecords(maxNumRecords)
.commitOffsetsInFinalize()
.withoutMetadata();
}
AvroGenericCoder
public class AvroGenericCoder extends CustomCoder<AvroGenericRecord> {
private final Map<String, Object> config;
private transient BeamKafkaAvroGenericDeserializer deserializer;
private transient BeamKafkaAvroGenericSerializer serializer;
public static AvroGenericCoder of(Map<String, Object> config) {
return new AvroGenericCoder(config);
}
protected AvroGenericCoder(Map<String, Object> config) {
this.config = config;
}
private BeamKafkaAvroGenericDeserializer getDeserializer() {
if (deserializer == null) {
BeamKafkaAvroGenericDeserializer d = new BeamKafkaAvroGenericDeserializer();
d.configure(config, false);
deserializer = d;
}
return deserializer;
}
private BeamKafkaAvroGenericSerializer getSerializer() {
if (serializer == null) {
serializer = new BeamKafkaAvroGenericSerializer();
}
return serializer;
}
#Override
public void encode(AvroGenericRecord record, OutputStream outStream) {
getSerializer().serialize(record, outStream);
}
#Override
public AvroGenericRecord decode(InputStream inStream) {
try {
return getDeserializer().deserialize(null, IOUtils.toByteArray(inStream));
} catch (IOException e) {
throw new RuntimeException("Error translating into bytes ", e);
}
}
#Override
public void verifyDeterministic() {
}
#Override
public Object structuralValue(AvroGenericRecord value) {
return super.structuralValue(value);
}
#Override
public int hashCode() {
return HashCodeBuilder.reflectionHashCode(this);
}
#Override
public boolean equals(Object obj) {
return EqualsBuilder.reflectionEquals(this, obj);
}
}
This is main pipeline
PCollection<AvroGenericRecord> records = p.apply(readKafkaTr)
.apply(Window.<AvroGenericRecord>into(FixedWindows.of(Duration.standardMinutes(options.getWindowInMinutes())))
.triggering(AfterWatermark.pastEndOfWindow()
.withEarlyFirings(AfterProcessingTime.pastFirstElementInPane()
.plusDelayOf(Duration.standardMinutes(options.getWindowInMinutes())))
.withLateFirings(AfterPane.elementCountAtLeast(options.getElementsCountToWaitAfterWatermark())))
.withAllowedLateness(Duration.standardDays(options.getAfterWatermarkInDays()))
.discardingFiredPanes()
);
records.apply(Filter.by((ProcessFunction<AvroGenericRecord, Boolean>) Objects::nonNull))
.apply(new WriteAvroFilesTr(options.getBasePath(), options.getNumberOfShards()));
Yes, I think #RyanSkraba is right - DirectRunner does many things that not all other runners do (because initial goal of DirectRunner was to be used for testing, so it performs many additional checks comparing to other runners).
Btw, why would not use Beam AvroCoder in this case? Simple example how to use it with KafkaIO:
https://github.com/aromanenko-dev/beam-issues/blob/master/kafka-io/src/main/java/KafkaAvro.java

How to evaluate consuming time in kafka stream application

I have 1.0.0 kafka stream application with two classes as below 'class FilterByPolicyStreamsApp' and 'class FilterByPolicyTransformerSupplier'. In my application, I read the events, perform some conditional checks and forward to same kafka in another topic. I able to get the producing time with 'eventsForwardTimeInMs' variable in FilterByPolicyTransformerSupplier class. But I unable to get the consuming time (with and without (de)serialization). How will I get this time? Please help me.
FilterByPolicyStreamsApp .java:
public class FilterByPolicyStreamsApp implements CommandLineRunner {
String policyKafkaTopicName="policy";
String policyFilterDataKafkaTopicName = "policy.filter.data";
String bootstrapServers="11.1.1.1:9092";
String sampleEventsKafkaTopicName = 'sample-.*";
String applicationId="filter-by-policy-app";
String policyFilteredEventsKafkaTopicName = "policy.filter.events";
public static void main(String[] args) {
SpringApplication.run(FilterByPolicyStreamsApp.class, args);
}
#Override
public void run(String... arg0) {
String policyGlobalTableName = policyKafkaTopicName + ".table";
String policyFilterDataGlobalTable = policyFilterDataKafkaTopicName + ".table";
Properties config = new Properties();
config.put(StreamsConfig.APPLICATION_ID_CONFIG, applicationId);
config.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
config.put(StreamsConfig.DEFAULT_TIMESTAMP_EXTRACTOR_CLASS_CONFIG, WallclockTimestampExtractor.class);
KStreamBuilder builder = new KStreamBuilder();
builder.globalTable(Serdes.String(), new JsonSerde<>(List.class), policyKafkaTopicName,
policyGlobalTableName);
builder.globalTable(Serdes.String(), new JsonSerde<>(PolicyFilterData.class), policyFilterDataKafkaTopicName,
policyFilterDataGlobalTable);
KStream<String, SampleEvent> events = builder.stream(Serdes.String(),
new JsonSerde<>(SampleEvent.class), Pattern.compile(sampleEventsKafkaTopicName));
events = events.transform(new FilterByPolicyTransformerSupplier(policyGlobalTableName,
policyFilterDataGlobalTable));
events.to(Serdes.String(), new JsonSerde<>(SampleEvent.class), policyFilteredEventsKafkaTopicName);
KafkaStreams streams = new KafkaStreams(builder, config);
streams.start();
streams.setUncaughtExceptionHandler(new Thread.UncaughtExceptionHandler() {
#Override
public void uncaughtException(Thread t, Throwable e) {
logger.error(e.getMessage(), e);
}
});
}
}
FilterByPolicyTransformerSupplier.java:
public class FilterByPolicyTransformerSupplier
implements TransformerSupplier<String, SampleEvent, KeyValue<String, SampleEvent>> {
private String policyGlobalTableName;
private String policyFilterDataGlobalTable;
public FilterByPolicyTransformerSupplier(String policyGlobalTableName,
String policyFilterDataGlobalTable) {
this.policyGlobalTableName = policyGlobalTableName;
this.policyFilterDataGlobalTable = policyFilterDataGlobalTable;
}
#Override
public Transformer<String, SampleEvent, KeyValue<String, SampleEvent>> get() {
return new Transformer<String, SampleEvent, KeyValue<String, SampleEvent>>() {
private KeyValueStore<String, List<String>> policyStore;
private KeyValueStore<String, PolicyFilterData> policyMetadataStore;
private ProcessorContext context;
#Override
public void close() {
}
#Override
public void init(ProcessorContext context) {
this.context = context;
// Call punctuate every 1 second
this.context.schedule(1000);
policyStore = (KeyValueStore<String, List<String>>) this.context
.getStateStore(policyGlobalTableName);
policyMetadataStore = (KeyValueStore<String, PolicyFilterData>) this.context
.getStateStore(policyFilterDataGlobalTable);
}
#Override
public KeyValue<String, SampleEvent> punctuate(long arg0) {
return null;
}
#Override
public KeyValue<String, SampleEvent> transform(String key, SampleEvent event) {
long eventsForwardTimeInMs = 0;
long forwardedEventCouunt = 0;
List<String> policyIds = policyStore.get(event.getCustomerCode().toLowerCase());
if (policyIds != null) {
for (String policyId : policyIds) {
/*
PolicyFilterData policyFilterMetadata = policyMetadataStore.get(policyId);
Do some condition checks on the event. If it satisfies then will forward them.
if(policyFilterMetadata == null){
continue;
}
*/
// Using context forward as event can map to multiple policies
long startForwardTime = System.currentTimeMillis();
context.forward(policyId, event);
forwardedEventCouunt++;
eventsForwardTimeInMs += System.currentTimeMillis() - startForwardTime;
}
}
return null;
}
};
}
}

Flink Kafka Consumer throws Null Pointer Exception when using DataStream key by

I am using this example Flink CEP where I am separating out the data as I have created one application which is Sending application to Kafka & another application reading from Kafka... I generated the producer for class TemperatureWarning i.e. in Kafka,I was sending data related to TemperatureWarning Following is my code which is consuming data from Kafka...
StreamExecutionEnvironment env=StreamExecutionEnvironment.getExecutionEnvironment();
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
env.enableCheckpointing(5000);
Properties properties=new Properties();
properties.setProperty("bootstrap.servers", "PUBLICDNS:9092");
properties.setProperty("zookeeper.connect", "PUBLICDNS:2181");
properties.setProperty("group.id", "test");
DataStream<TemperatureWarning> dstream=env.addSource(new FlinkKafkaConsumer09<TemperatureWarning>("MonitoringEvent", new MonitoringEventSchema(), properties));
Pattern<TemperatureWarning, ?> alertPattern = Pattern.<TemperatureWarning>begin("first")
.next("second")
.within(Time.seconds(20));
PatternStream<TemperatureWarning> alertPatternStream = CEP.pattern(
dstream.keyBy("rackID"),
alertPattern);
DataStream<TemperatureAlert> alerts = alertPatternStream.flatSelect(
(Map<String, TemperatureWarning> pattern, Collector<TemperatureAlert> out) -> {
TemperatureWarning first = pattern.get("first");
TemperatureWarning second = pattern.get("second");
if (first.getAverageTemperature() < second.getAverageTemperature()) {
out.collect(new TemperatureAlert(second.getRackID(),second.getAverageTemperature(),second.getTimeStamp()));
}
});
dstream.print();
alerts.print();
env.execute("Flink Kafka Consumer");
But when I execute this application,it throws following Exception:
Exception in thread "main" java.lang.NullPointerException
at org.apache.flink.api.common.operators.Keys$ExpressionKeys.<init>(Keys.java:329)
at org.apache.flink.streaming.api.datastream.DataStream.keyBy(DataStream.java:274)
at com.yash.consumer.KafkaFlinkConsumer.main(KafkaFlinkConsumer.java:49)
Following is my class TemperatureWarning :
public class TemperatureWarning {
private int rackID;
private double averageTemperature;
private long timeStamp;
public TemperatureWarning(int rackID, double averageTemperature,long timeStamp) {
this.rackID = rackID;
this.averageTemperature = averageTemperature;
this.timeStamp=timeStamp;
}
public TemperatureWarning() {
this(-1, -1,-1);
}
public int getRackID() {
return rackID;
}
public void setRackID(int rackID) {
this.rackID = rackID;
}
public double getAverageTemperature() {
return averageTemperature;
}
public void setAverageTemperature(double averageTemperature) {
this.averageTemperature = averageTemperature;
}
public long getTimeStamp() {
return timeStamp;
}
public void setTimeStamp(long timeStamp) {
this.timeStamp = timeStamp;
}
#Override
public boolean equals(Object obj) {
if (obj instanceof TemperatureWarning) {
TemperatureWarning other = (TemperatureWarning) obj;
return rackID == other.rackID && averageTemperature == other.averageTemperature;
} else {
return false;
}
}
#Override
public int hashCode() {
return 41 * rackID + Double.hashCode(averageTemperature);
}
#Override
public String toString() {
//return "TemperatureWarning(" + getRackID() + ", " + averageTemperature + ")";
return "TemperatureWarning(" + getRackID() +","+averageTemperature + ") "+ "," + getTimeStamp();
}
}
Following is my class MonitoringEventSchema :
public class MonitoringEventSchema implements DeserializationSchema<TemperatureWarning>,SerializationSchema<TemperatureWarning>
{
#Override
public TypeInformation<TemperatureWarning> getProducedType() {
// TODO Auto-generated method stub
return null;
}
#Override
public byte[] serialize(TemperatureWarning element) {
// TODO Auto-generated method stub
return element.toString().getBytes();
}
#Override
public TemperatureWarning deserialize(byte[] message) throws IOException {
// TODO Auto-generated method stub
if(message!=null)
{
String str=new String(message,"UTF-8");
String []val=str.split(",");
TemperatureWarning warning=new TemperatureWarning(Integer.parseInt(val[0]),Double.parseDouble(val[1]),Long.parseLong(val[2]));
return warning;
}
return null;
}
#Override
public boolean isEndOfStream(TemperatureWarning nextElement) {
// TODO Auto-generated method stub
return false;
}
}
Now what is required to do keyBy operation as I have mentioned the key which is required for stream to partition ?? What needs to be done here to solve this error ??
The problem is in this function:
#Override
public TypeInformation<TemperatureWarning> getProducedType() {
// TODO Auto-generated method stub
return null;
}
you cannot return null here.