I'm a beginner with kafka stream,
when I create two sample modules one for "Order" and other is "Payment"
In the "Order" Project I use kafka to send to message to "oder-topic"
then in "Payment" I use Kafka listener to received value then send value to other topic( ex: payment-topic).
I want to use kafka Stream in "Order" module to read value from "payment-topic" I define it Order application java class in a "Order" module like this:
#SpringBootApplication
#EnableKafka
#EnableKafkaStreams
public class OrderServiceApplication {
// this topic will received all value that were processed
public static final String OUTPUT_TOPIC_NAME = "ordered";
// this topic is a places received value from payment and stock send value
public static final String INPUT_ORDER_TOPIC_NAME = "order-topic-result";
public static final String INPUT_STOCK_TOPIC_NAME = "stock-topic-result";
public static void main(String[] args) {
SpringApplication.run (OrderServiceApplication.class, args);
}
#Bean
public KStream <String, String> readStream (StreamsBuilder kStreamBuilder) {
KStream <String, String> input = kStreamBuilder.stream(INPUT_ORDER_TOPIC_NAME);
KStream<String, String> output = input.filter((key, value) -> value.length() > 2);
output.to(OUTPUT_TOPIC_NAME);
return output;
}
but the readStream does not work. Please help me
How will I execute the method is automatically after the value was send to this topic
Related
The Flink consumer application I am developing reads from multiple Kafka topics. The messages published in the different topics adhere to the same schema (formatted as Avro). For schema management, I am using the Confluent Schema Registry.
I have been using the following snippet for the KafkaSource and it works just fine.
KafkaSource<MyObject> source = KafkaSource.<MyObject>builder()
.setBootstrapServers(BOOTSTRAP_SERVERS)
.setTopics(TOPIC-1, TOPIC-2)
.setGroupId(GROUP_ID)
.setStartingOffsets(OffsetsInitializer.earliest())
.setValueOnlyDeserializer(ConfluentRegistryAvroDeserializationSchema.forSpecific(MyObject.class, SCHEMA_REGISTRY_URL))
.build();
Now, I want to determine the topic-name for each message that I process. Since the current deserializer is ValueOnly, I started looking into the setDeserializer() method which I felt would give me access to the whole ConsumerRecord object and I can fetch the topic-name from that.
However, I am unable to figure out how to use that implementation. Should I implement my own deserializer? If so, how does the Schema registry fit into that implementation?
You can use the setDeserializer method with a KafkaRecordDeserializationSchema that might look something like this:
public class KafkaUsageRecordDeserializationSchema
implements KafkaRecordDeserializationSchema<UsageRecord> {
private static final long serialVersionUID = 1L;
private transient ObjectMapper objectMapper;
#Override
public void open(DeserializationSchema.InitializationContext context) throws Exception {
KafkaRecordDeserializationSchema.super.open(context);
objectMapper = JsonMapper.builder().build();
}
#Override
public void deserialize(
ConsumerRecord<byte[], byte[]> consumerRecord,
Collector<UsageRecord> collector) throws IOException {
collector.collect(objectMapper.readValue(consumerRecord.value(), UsageRecord.class));
}
#Override
public TypeInformation<UsageRecord> getProducedType() {
return TypeInformation.of(UsageRecord.class);
}
}
Then you can use the ConsumerRecord to access the topic and other metadata.
I took inspiration from the above answer (by David) and added the following custom deserializer -
KafkaSource<MyObject> source = KafkaSource.<MyObject>builder()
.setBootstrapServers(BOOTSTRAP_SERVERS)
.setTopics(TOPIC-1, TOPIC-2)
.setGroupId(GROUP_ID)
.setStartingOffsets(OffsetsInitializer.earliest())
.setDeserializer(KafkaRecordDeserializationSchema.of(new KafkaDeserializationSchema<Event>{
DeserializationSchema deserialzationSchema = ConfluentRegistryAvroDeserializationSchema.forSpecific(MyObject.class, SCHEMA_REGISTRY_URL);
#Override
public boolean isEndOfStream(Event nextElement) {
return false;
}
#Override
public String deserialize(ConsumerRecord<byte[], byte[]> consumerRecord) throws Exception {
Event event = new Event();
event.setTopicName(record.topic());
event.setMyObject((MyObject) deserializationSchema.deserialize(record.value()));
return event;
}
#Override
public TypeInformation<String> getProducedType() {
return TypeInformation.of(Event.class);
}
})).build();
The Event class is a wrapper over the MyObject class with additional field for storing the topic name.
i have two kafka listeners like below:
#KafkaListener(topics = "foo1, foo2", groupId = foo.id, id = "foo")
public void fooTopics(#Header(KafkaHeaders.RECEIVED_TOPIC) String topic, String message, Acknowledgment acknowledgment) {
//processing
}
#KafkaListener(topics = "Bar1, Bar2", groupId = bar.id, id = "bar")
public void barTopics(#Header(KafkaHeaders.RECEIVED_TOPIC) String topic, String message, Acknowledgment acknowledgment) {
//processing
same application is running on two instances like inc1 and inc2. is there a way if i can assign foo listener to inc1 and bar listener to inc2. and if one instance is going down both the listener(foo and bar) assign to the running instance.
You can use the #KafkaListener property autoStartup, introduced since 2.2.
When an instance die, you can automatically start it up in the other instance like so:
#Autowired
private KafkaListenerEndpointRegistry registry;
...
#KafkaListener(topics = "foo1, foo2", groupId = foo.id, id = "foo", autoStartup = "false")
public void fooTopics(#Header(KafkaHeaders.RECEIVED_TOPIC) String topic, String message, Acknowledgment acknowledgment) {
//processing
}
//Start up condition
registry.getListenerContainer("foo").start();
From what I can tell, with Flink's AVRO deserialization, you can create a stream of Avro-objects and that's fine, but there seems to be an issue, where Flink's kafka consumer only creates streams of single object:
FlinkKafkaConsumerBase<T> as opposed to your default Kafka API with its KafkaConsumer.
In my case both Key and Value are separate AVRO-schema-compliant objects and merging their schemas might be a nightmare...
Additionally it seems that with Flink API I can't retrieve ConsumerRecord information?...
Based on the Flink Kafka Consumer, there is a constructor:
public FlinkKafkaConsumer(String topic, KeyedDeserializationSchema<T> deserializer, Properties props) {
this(Collections.singletonList(topic), deserializer, props);
}
The second parameter - KeyedDeserializationSchema is used to deserialise Kafka record. It includes message key, message value, offset, topic, etc. So you can implement your own type named MyKafkaRecord as T with Avro key and Avro value in it. Then pass MyKafkaRecord as T to your implementation of KeyedDeserializationSchema. Refer to TypeInformationKeyValueSerializationSchema as an example.
E.g. Reading extra info from Kafka:
class KafkaRecord<K, V> {
private K key;
private V value;
private long offset;
private int partition;
private String topic;
...
}
class MySchema<K, V> implements KeyedDeserializationSchema<KafkaRecord<K, V>> {
KafkaRecord<K, V> deserialize(byte[] messageKey, byte[] message, String topic, int partition, long offset) {
KafkaRecord<K, V> rec = new KafkaRecord<>();
rec.key = KEY_DESERIaLISER.deserialize(messageKey);
rec.value = ...;
rec.topic = topic;
...
}
}
I know I can find out from which partition record comes in, but I wonder is any way to dynamically get which partitions are assigned for consumers at specific moment? Maybe I need to implement some listener to detect and follow up partitions assignation info?
I am using spring-kafka 1.3.2 with ConcurrentKafkaListenerContainerFactory and #KafkaListener.
Yes, you can do:
#Bean
public ConsumerAwareRebalanceListener rebalanceListener() {
return new ConsumerAwareRebalanceListener() {
#Override
public void onPartitionsAssigned(Consumer<?, ?> consumer, Collection<TopicPartition> partitions) {
// here partitions
}
};
}
And then add it, for example, to ConcurrentKafkaListenerContainerFactory
#Bean
public ConcurrentKafkaListenerContainerFactory<Object, Object> kafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory<Object, Object> factory = new ConcurrentKafkaListenerContainerFactory<>();
ContainerProperties props = factory.getContainerProperties();
props.setConsumerRebalanceListener(rebalanceListener());
return factory;
}
I did it in different way by using KafkaListenerEndpointRegistry
for (MessageListenerContainer messageListenerContainer : kafkaListenerEndpointRegistry.getListenerContainers()) {
List<KafkaMessageListenerContainer> containers = ((ConcurrentMessageListenerContainer) messageListenerContainer).getContainers();
List<TopicPartition> topicPartitions = (List<TopicPartition>) containers.stream().flatMap(kafkaMessageListenerContainer ->
kafkaMessageListenerContainer.getAssignedPartitions().stream()).collect(Collectors.toList());
partitions.addAll(topicPartitions.stream().map(TopicPartition::partition).collect(Collectors.toList()));
}
I am trying to create a Spring Boot application with Spring Cloud Stream and Kafka integration. I created a sample Topic in Kafka with 1 partition and have published to the topic from the Spring Boot application created based on the directions given here
http://docs.spring.io/spring-cloud-stream/docs/1.0.2.RELEASE/reference/htmlsingle/index.html
and
https://blog.codecentric.de/en/2016/04/event-driven-microservices-spring-cloud-stream/
Spring Boot App -
#SpringBootApplication
public class MyApplication {
private static final Log logger = LogFactory.getLog(MyApplication.class);
public static void main(String[] args) {
SpringApplication.run(MyApplication.class, args);
}
}
Kafka Producer Class
#Service
#EnableBinding(Source.class)
public class MyProducer {
private static final Log logger = LogFactory.getLog(MyProducer.class);
#Bean
#InboundChannelAdapter(value = Source.OUTPUT, poller = #Poller(fixedDelay = "10000", maxMessagesPerPoll = "1"))
public MessageSource<TimeInfo> timerMessageSource() {
TimeInfo t = new TimeInfo(new Timestamp(new Date().getTime())+"","Label");
MessageBuilder<TimeInfo> m = MessageBuilder.withPayload(t);
return () -> m.build();
}
public static class TimeInfo{
private String time;
private String label;
public TimeInfo(String time, String label) {
super();
this.time = time;
this.label = label;
}
public String getTime() {
return time;
}
public String getLabel() {
return label;
}
}
}
All is working well except for when I want to handle exceptions.
If the Kafka Topic went down, I can see the ConnectionRefused exception being thrown in the log files for the app, but the retry logic built in seems to be going at retrying continuously without stopping!
There is no exception thrown at all for me to handle and do further exception processing. I have read through the Producer options and the Binder options for Kafka in the Spring Cloud Stream documentation above and I cannot see any customization options possible to get this exception thrown above all the way for me to capture.
I am new to Spring Boot / Spring Cloud Stream / Spring Integration (which seems to be the underlying implementation to the cloud stream project).
Is there anything else you guys know to get this exception cascaded to my Spring Cloud Stream app?