Adding data to state store for stateful processing and fault tolerance - apache-kafka

I have a microservice that perform some stateful processing. The application construct a KStream from an input topic, do some stateful processing then write data into the output topic.
I will be running 3 of this applications in the same group. There are 3 parameters that I need to store in the event when the microservice goes down, the microservice that takes over can query the shared statestore and continue where the crashed service left off.
I am thinking of pushing these 3 parameters into a statestore and query the data when the other microservice takes over. From my research, I have seen a lot of example when people perform event counting using state store but that's not exactly what I want, does anyone know an example or what is the right approach for this problem?

So you want to do 2 things:
a. the service going down have to store the parameters:
If you want to do it in a straightforward way than all you have to do is to write a message in the topic associated with the state store (the one you are reading with a KTable). Use the Kafka Producer API or a KStream (could be kTable.toStream()) to do it and that's it.
Otherwise you could create manually a state store:
// take these serde as just an example
Serde<String> keySerde = Serdes.String();
Serde<String> valueSerde = Serdes.String();
KeyValueBytesStoreSupplier storeSupplier = inMemoryKeyValueStore(stateStoreName);
streamsBuilder.addStateStore(Stores.keyValueStoreBuilder(storeSupplier, keySerde, valueSerde));
then use it in a transformer or processor to add items to it; you'll have to declare this in the transformer/processor:
// depending on the serde above you might have something else then String
private KeyValueStore<String, String> stateStore;
and initialize the stateStore variable:
#Override
public void init(ProcessorContext context) {
stateStore = (KeyValueStore<String, String>) context.getStateStore(stateStoreName);
}
and later use the stateStore variable:
#Override
public KeyValue<String, String> transform(String key, String value) {
// using stateStore among other actions you might take here
stateStore.put(key, processedValue);
}
b. read the parameters in the service taking over:
You could do it with a Kafka consumer but with Kafka Streams you first have to make the store available; the easiest way to do it is by creating a KTable; then you have to get the queryable store name that is automatically created with the KTable; then you have to actually get access to the store; then you extract a record value from the store (i.e. a parameter value by its key).
// this example is a modified copy of KTable javadocs example
final StreamsBuilder streamsBuilder = new StreamsBuilder();
// Creating a KTable over the topic containing your parameters a store shall automatically be created.
//
// The serde for your MyParametersClassType could be
// new org.springframework.kafka.support.serializer.JsonSerde(MyParametersClassType.class)
// though further configurations might be necessary here - e.g. setting the trusted packages for the ObjectMapper behind JsonSerde.
//
// If the parameter-value class is a String then you could use Serdes.String() instead of a MyParametersClassType serde.
final KTable paramsTable = streamsBuilder.table("parametersTopicName", Consumed.with(Serdes.String(), <<your InstanceOfMyParametersClassType serde>>));
...
// see the example from KafkaStreams javadocs for more KafkaStreams related details
final KafkaStreams streams = ...;
streams.start()
...
// get the queryable store name that is automatically created with the KTable
final String queryableStoreName = paramsTable.queryableStoreName();
// get access to the store
ReadOnlyKeyValueStore view = streams.store(queryableStoreName, QueryableStoreTypes.timestampedKeyValueStore());
// extract a record value from the store
InstanceOfMyParametersClassType parameter = view.get(key);

Related

Can the state stores in Kafka be shared across streams?

We have a scenario where a statestore having some values from one kstream needs to be accessed in another kstream, is there any way to achieve this?
They can be accessed with Interactive Queries.
Between applications or instances of the same application, you need to use RPC calls such as adding an HTTP or gRPC server.
https://docs.confluent.io/platform/current/streams/developer-guide/interactive-queries.html
You can attach the same state store to multiple processors if you use the Processor API, but also if you use the Processor API Integration in the DSL.
There are two ways to do that (see javadocs). You can either manually add the store to the processors, like:
// create store
StoreBuilder<KeyValueStore<String,String>> keyValueStoreBuilder =
Stores.keyValueStoreBuilder(Stores.persistentKeyValueStore("myProcessorState"),
Serdes.String(),
Serdes.String());
// add store
builder.addStateStore(keyValueStoreBuilder);
KStream outputStream = inputStream.processor(new ProcessorSupplier() {
public Processor get() {
return new MyProcessor();
}
}, "myProcessorState");
or you can implement stores() on the passed in ProcessorSupplier:
class MyProcessorSupplier implements ProcessorSupplier {
// supply processor
Processor get() {
return new MyProcessor();
}
// provide store(s) that will be added and connected to the associated processor
// the store name from the builder ("myProcessorState") is used to access the store later via the ProcessorContext
Set<StoreBuilder> stores() {
StoreBuilder<KeyValueStore<String, String>> keyValueStoreBuilder =
Stores.keyValueStoreBuilder(Stores.persistentKeyValueStore("myProcessorState"),
Serdes.String(),
Serdes.String());
return Collections.singleton(keyValueStoreBuilder);
}
}
These are examples for KStream#process(), but it works similarly for the family of KStream#*transform*() methods.

Create message liseners dynamically for the topics

I was analysing a problem on creating a generic consumer library which can be deployed in multiple microservices ( all of them are spring based) . The requirement is to have around 15-20 topics to listen .If we use annotation based kafka listener ,we need to add more code for each microservice . Is there any way where we can create the consumers dynamically based on some xml file where each consumer can have these data injected
topic
groupid
partition
filter (if any)
With annotations ,the design is very rigid .The only way I can think of is ,we can create messagelisteners after parsing xml config and each topic will have its own concurrentmessagelistenercontainer .
Is there any alternative better approach available using spring ?
P.S: I am little new to spring & kafka . Please let me know if there is confusion in explaning the requirements
Thanks,
Rajasekhar
Maybe you can use topic patterns. Take a look at consumer properties. E.g. the listener
#KafkaListener(topicPattern = "topic1|topic2")
will listen to topic1 and topic2.
If you need to create a listener dynamically extra care must be taken, because you must shutdown it.
I would use a similar approach as spring's KafkaListenerAnnotationBeanPostProcessor. This post processor is responsible for processing #KafkaListeners.
Here is a proposal of how it could work:
public class DynamicEndpointRegistrar {
private BeanFactory beanFactory;
private KafkaListenerContainerFactory<?> containerFactory;
private KafkaListenerEndpointRegistry endpointRegistry;
private MessageHandlerMethodFactory messageHandlerMethodFactory;
public DynamicEndpointRegistrar(BeanFactory beanFactory,
KafkaListenerContainerFactory<?> containerFactory,
KafkaListenerEndpointRegistry endpointRegistry, MessageHandlerMethodFactory messageHandlerMethodFactory) {
this.beanFactory = beanFactory;
this.containerFactory = containerFactory;
this.endpointRegistry = endpointRegistry;
this.messageHandlerMethodFactory = messageHandlerMethodFactory;
}
public void registerMethodEndpoint(String endpointId, Object bean, Method method, Properties consumerProperties,
String... topics) throws Exception {
KafkaListenerEndpointRegistrar registrar = new KafkaListenerEndpointRegistrar();
registrar.setBeanFactory(beanFactory);
registrar.setContainerFactory(containerFactory);
registrar.setEndpointRegistry(endpointRegistry);
registrar.setMessageHandlerMethodFactory(messageHandlerMethodFactory);
MethodKafkaListenerEndpoint<Integer, String> endpoint = new MethodKafkaListenerEndpoint<>();
endpoint.setBeanFactory(beanFactory);
endpoint.setMessageHandlerMethodFactory(messageHandlerMethodFactory);
endpoint.setId(endpointId);
endpoint.setGroupId(consumerProperties.getProperty(ConsumerConfig.GROUP_ID_CONFIG));
endpoint.setBean(bean);
endpoint.setMethod(method);
endpoint.setConsumerProperties(consumerProperties);
endpoint.setTopics(topics);
registrar.registerEndpoint(endpoint);
registrar.afterPropertiesSet();
}
}
You should then be able to register a listener dynamically. E.g.
DynamicEndpointRegistrar dynamicEndpointRegistrar = ...;
MyConsumer myConsumer = ...; // create an instance of your consumer
Properties properties = ...; // consumer properties
// the method that should be invoked
// (the method that's normally annotated with KafkaListener)
Method method = MyConsumer.class.getDeclaredMethod("consume", String.class);
dynamicEndpointRegistrar.registerMethodEndpoint("endpointId", myConsumer, method, properties, "topic");

Stateful filtering/flatMapValues in Kafka Streams?

I'm trying to write a simple Kafka Streams application (targeting Kafka 2.2/Confluent 5.2) to transform an input topic with at-least-once semantics into an exactly-once output stream. I'd like to encode the following logic:
For each message with a given key:
Read a message timestamp from a string field in the message value
Retrieve the greatest timestamp we've previously seen for this key from a local state store
If the message timestamp is less than or equal to the timestamp in the state store, don't emit anything
If the timestamp is greater than the timestamp in the state store, or the key doesn't exist in the state store, emit the message and update the state store with the message's key/timestamp
(This is guaranteed to provide correct results based on ordering guarantees that we get from the upstream system; I'm not trying to do anything magical here.)
At first I thought I could do this with the Kafka Streams flatMapValues operator, which lets you map each input message to zero or more output messages with the same key. However, that documentation explicitly warns:
This is a stateless record-by-record operation (cf. transformValues(ValueTransformerSupplier, String...) for stateful value transformation).
That sounds promising, but the transformValues documentation doesn't make it clear how to emit zero or one output messages per input message. Unless that's what the // or null aside in the example is trying to say?
flatTransform also looked somewhat promising, but I don't need to manipulate the key, and if possible I'd like to avoid repartitioning.
Anyone know how to properly perform this kind of filtering?
you could use Transformer for implementing stateful operations as you described above. In order to not propagate a message downstream, you need to return null from transform method, this mentioned in Transformer java doc. And you could manage propagation via processorContext.forward(key, value). Simplified example provided below
kStream.transform(() -> new DemoTransformer(stateStoreName), stateStoreName)
public class DemoTransformer implements Transformer<String, String, KeyValue<String, String>> {
private ProcessorContext processorContext;
private String stateStoreName;
private KeyValueStore<String, String> keyValueStore;
public DemoTransformer(String stateStoreName) {
this.stateStoreName = stateStoreName;
}
#Override
public void init(ProcessorContext processorContext) {
this.processorContext = processorContext;
this.keyValueStore = (KeyValueStore) processorContext.getStateStore(stateStoreName);
}
#Override
public KeyValue<String, String> transform(String key, String value) {
String existingValue = keyValueStore.get(key);
if (/* your condition */) {
processorContext.forward(key, value);
keyValueStore.put(key, value);
}
return null;
}
#Override
public void close() {
}
}

Kafka Streams persistent store error: the state store, may have migrated to another instance

I am using Kafka Streams with Spring Boot. In my use case when I receive customer event from other microservice I need to store in customer materialized view and when I receive order event, I need to join customer and order then store in customer-order materialized view. To achieve this I created persistent key-value store customer-store and updating this when a new event comes.
StoreBuilder customerStateStore = Stores.keyValueStoreBuilder(Stores.persistentKeyValueStore("customer"),Serdes.String(), customerSerde).withLoggingEnabled(new HashMap<>());
streamsBuilder.addStateStore(customerStateStore);
KTable<String,Customer> customerKTable=streamsBuilder.table("customer",Consumed.with(Serdes.String(),customerSerde));
customerKTable.foreach(((key, value) -> System.out.println("Customer from Topic: "+value)));
Configured Topology, Streams and started streams object. When I try to access store using ReadOnlyKeyValueStore, I got the following exception, even though I stored some objects few moments ago
streams.start();
ReadOnlyKeyValueStore<String, Customer> customerStore = streams.store("customer", QueryableStoreTypes.keyValueStore());
System.out.println("customerStore.approximateNumEntries()-> " + customerStore.approximateNumEntries());
Code uploaded to Github for reference. Appreciate your help.
Exception:
org.apache.kafka.streams.errors.InvalidStateStoreException: the state store, customer, may have migrated to another instance.
at org.apache.kafka.streams.state.internals.QueryableStoreProvider.getStore(QueryableStoreProvider.java:60)
at org.apache.kafka.streams.KafkaStreams.store(KafkaStreams.java:1043)
at com.kafkastream.service.EventsListener.main(EventsListener.java:94)
The state store needs some time to be prepared usually. The simplest approach is like below. (code from the official document)
public static <T> T waitUntilStoreIsQueryable(final String storeName,
final QueryableStoreType<T> queryableStoreType,
final KafkaStreams streams) throws InterruptedException {
while (true) {
try {
return streams.store(storeName, queryableStoreType);
} catch (InvalidStateStoreException ignored) {
// store not yet ready for querying
Thread.sleep(100);
}
}
}
You can find additional info in the document.
https://docs.confluent.io/current/streams/faq.html#interactive-queries

getting statestore data from called function in kafka streams

In Kafka Streams' Processor API, can I pass processor context from init() as follows to other function and get the context back with state store in process()?
public void init(ProcessorContext context) {
this.context = context;
String resourceName = "config.properties";
ClassLoader loader = Thread.currentThread().getContextClassLoader();
Properties props = new Properties();
try(InputStream resourceStream = loader.getResourceAsStream(resourceName)) {
props.load(resourceStream);
}
catch(IOException e){
e.printStackTrace();
}
dataSplitter.timerMessageSource(props, context);//can I pass context like this?
this.context.schedule(1000);
// retrieve the key-value store named "patient"
kvStore = (KeyValueStore<String, PatientDataSummary>) this.context.getStateStore("patient");
//want to get the value of statestore filled by the called function timerMessageSource(), as the data to be put in statestore is getting generated in timerMessageSource()
//is there any way I can get that by using context or so
}
The usage of ProcessorContext is somewhat limited and you cannot call each method is provides at arbitrary times. Thus, it depend how you use it -- in general, you can pass it around as you wish (it will always be the same object throughout the live time of the processor).
If I understand your question correctly, you register a punctuation and use your dataSplitter within the punctuation callback and want to modify the store. That is absolutely possible -- you can either put the store into a class member similar to what you do with the context or use the context object to get the store within the punctuate callback.