Can the state stores in Kafka be shared across streams? - apache-kafka

We have a scenario where a statestore having some values from one kstream needs to be accessed in another kstream, is there any way to achieve this?

They can be accessed with Interactive Queries.
Between applications or instances of the same application, you need to use RPC calls such as adding an HTTP or gRPC server.
https://docs.confluent.io/platform/current/streams/developer-guide/interactive-queries.html

You can attach the same state store to multiple processors if you use the Processor API, but also if you use the Processor API Integration in the DSL.
There are two ways to do that (see javadocs). You can either manually add the store to the processors, like:
// create store
StoreBuilder<KeyValueStore<String,String>> keyValueStoreBuilder =
Stores.keyValueStoreBuilder(Stores.persistentKeyValueStore("myProcessorState"),
Serdes.String(),
Serdes.String());
// add store
builder.addStateStore(keyValueStoreBuilder);
KStream outputStream = inputStream.processor(new ProcessorSupplier() {
public Processor get() {
return new MyProcessor();
}
}, "myProcessorState");
or you can implement stores() on the passed in ProcessorSupplier:
class MyProcessorSupplier implements ProcessorSupplier {
// supply processor
Processor get() {
return new MyProcessor();
}
// provide store(s) that will be added and connected to the associated processor
// the store name from the builder ("myProcessorState") is used to access the store later via the ProcessorContext
Set<StoreBuilder> stores() {
StoreBuilder<KeyValueStore<String, String>> keyValueStoreBuilder =
Stores.keyValueStoreBuilder(Stores.persistentKeyValueStore("myProcessorState"),
Serdes.String(),
Serdes.String());
return Collections.singleton(keyValueStoreBuilder);
}
}
These are examples for KStream#process(), but it works similarly for the family of KStream#*transform*() methods.

Related

Create message liseners dynamically for the topics

I was analysing a problem on creating a generic consumer library which can be deployed in multiple microservices ( all of them are spring based) . The requirement is to have around 15-20 topics to listen .If we use annotation based kafka listener ,we need to add more code for each microservice . Is there any way where we can create the consumers dynamically based on some xml file where each consumer can have these data injected
topic
groupid
partition
filter (if any)
With annotations ,the design is very rigid .The only way I can think of is ,we can create messagelisteners after parsing xml config and each topic will have its own concurrentmessagelistenercontainer .
Is there any alternative better approach available using spring ?
P.S: I am little new to spring & kafka . Please let me know if there is confusion in explaning the requirements
Thanks,
Rajasekhar
Maybe you can use topic patterns. Take a look at consumer properties. E.g. the listener
#KafkaListener(topicPattern = "topic1|topic2")
will listen to topic1 and topic2.
If you need to create a listener dynamically extra care must be taken, because you must shutdown it.
I would use a similar approach as spring's KafkaListenerAnnotationBeanPostProcessor. This post processor is responsible for processing #KafkaListeners.
Here is a proposal of how it could work:
public class DynamicEndpointRegistrar {
private BeanFactory beanFactory;
private KafkaListenerContainerFactory<?> containerFactory;
private KafkaListenerEndpointRegistry endpointRegistry;
private MessageHandlerMethodFactory messageHandlerMethodFactory;
public DynamicEndpointRegistrar(BeanFactory beanFactory,
KafkaListenerContainerFactory<?> containerFactory,
KafkaListenerEndpointRegistry endpointRegistry, MessageHandlerMethodFactory messageHandlerMethodFactory) {
this.beanFactory = beanFactory;
this.containerFactory = containerFactory;
this.endpointRegistry = endpointRegistry;
this.messageHandlerMethodFactory = messageHandlerMethodFactory;
}
public void registerMethodEndpoint(String endpointId, Object bean, Method method, Properties consumerProperties,
String... topics) throws Exception {
KafkaListenerEndpointRegistrar registrar = new KafkaListenerEndpointRegistrar();
registrar.setBeanFactory(beanFactory);
registrar.setContainerFactory(containerFactory);
registrar.setEndpointRegistry(endpointRegistry);
registrar.setMessageHandlerMethodFactory(messageHandlerMethodFactory);
MethodKafkaListenerEndpoint<Integer, String> endpoint = new MethodKafkaListenerEndpoint<>();
endpoint.setBeanFactory(beanFactory);
endpoint.setMessageHandlerMethodFactory(messageHandlerMethodFactory);
endpoint.setId(endpointId);
endpoint.setGroupId(consumerProperties.getProperty(ConsumerConfig.GROUP_ID_CONFIG));
endpoint.setBean(bean);
endpoint.setMethod(method);
endpoint.setConsumerProperties(consumerProperties);
endpoint.setTopics(topics);
registrar.registerEndpoint(endpoint);
registrar.afterPropertiesSet();
}
}
You should then be able to register a listener dynamically. E.g.
DynamicEndpointRegistrar dynamicEndpointRegistrar = ...;
MyConsumer myConsumer = ...; // create an instance of your consumer
Properties properties = ...; // consumer properties
// the method that should be invoked
// (the method that's normally annotated with KafkaListener)
Method method = MyConsumer.class.getDeclaredMethod("consume", String.class);
dynamicEndpointRegistrar.registerMethodEndpoint("endpointId", myConsumer, method, properties, "topic");

Adding data to state store for stateful processing and fault tolerance

I have a microservice that perform some stateful processing. The application construct a KStream from an input topic, do some stateful processing then write data into the output topic.
I will be running 3 of this applications in the same group. There are 3 parameters that I need to store in the event when the microservice goes down, the microservice that takes over can query the shared statestore and continue where the crashed service left off.
I am thinking of pushing these 3 parameters into a statestore and query the data when the other microservice takes over. From my research, I have seen a lot of example when people perform event counting using state store but that's not exactly what I want, does anyone know an example or what is the right approach for this problem?
So you want to do 2 things:
a. the service going down have to store the parameters:
If you want to do it in a straightforward way than all you have to do is to write a message in the topic associated with the state store (the one you are reading with a KTable). Use the Kafka Producer API or a KStream (could be kTable.toStream()) to do it and that's it.
Otherwise you could create manually a state store:
// take these serde as just an example
Serde<String> keySerde = Serdes.String();
Serde<String> valueSerde = Serdes.String();
KeyValueBytesStoreSupplier storeSupplier = inMemoryKeyValueStore(stateStoreName);
streamsBuilder.addStateStore(Stores.keyValueStoreBuilder(storeSupplier, keySerde, valueSerde));
then use it in a transformer or processor to add items to it; you'll have to declare this in the transformer/processor:
// depending on the serde above you might have something else then String
private KeyValueStore<String, String> stateStore;
and initialize the stateStore variable:
#Override
public void init(ProcessorContext context) {
stateStore = (KeyValueStore<String, String>) context.getStateStore(stateStoreName);
}
and later use the stateStore variable:
#Override
public KeyValue<String, String> transform(String key, String value) {
// using stateStore among other actions you might take here
stateStore.put(key, processedValue);
}
b. read the parameters in the service taking over:
You could do it with a Kafka consumer but with Kafka Streams you first have to make the store available; the easiest way to do it is by creating a KTable; then you have to get the queryable store name that is automatically created with the KTable; then you have to actually get access to the store; then you extract a record value from the store (i.e. a parameter value by its key).
// this example is a modified copy of KTable javadocs example
final StreamsBuilder streamsBuilder = new StreamsBuilder();
// Creating a KTable over the topic containing your parameters a store shall automatically be created.
//
// The serde for your MyParametersClassType could be
// new org.springframework.kafka.support.serializer.JsonSerde(MyParametersClassType.class)
// though further configurations might be necessary here - e.g. setting the trusted packages for the ObjectMapper behind JsonSerde.
//
// If the parameter-value class is a String then you could use Serdes.String() instead of a MyParametersClassType serde.
final KTable paramsTable = streamsBuilder.table("parametersTopicName", Consumed.with(Serdes.String(), <<your InstanceOfMyParametersClassType serde>>));
...
// see the example from KafkaStreams javadocs for more KafkaStreams related details
final KafkaStreams streams = ...;
streams.start()
...
// get the queryable store name that is automatically created with the KTable
final String queryableStoreName = paramsTable.queryableStoreName();
// get access to the store
ReadOnlyKeyValueStore view = streams.store(queryableStoreName, QueryableStoreTypes.timestampedKeyValueStore());
// extract a record value from the store
InstanceOfMyParametersClassType parameter = view.get(key);

Accessing Pipeline within DoFn

I'm writing a pipeline to replicate data from one source to another. Info about data sources is stored in db (BQ). How I can use this data it to build read/write endpoints dynamically?
I tried to pass Pipeline object to my custom DoFn but it can't be serialized. Later I tried to call method getPipeline() on a passed view but it doesn't work as well. -- which is actually expected
I can't know all tables I need to serialize in advance so I have to read all data from db (or any other source).
// builds some random view
PCollectionView<IdWrapper> idView = ...;
// reads tables meta and replicates data per each table
pipeline.apply(getTableMetaEndpont().read())
.apply(ParDo.of(new MyCustomReplicator(idView)).withSideInputs(idView))
private static class MyCustomReplicator extends DoFn<TableMeta, TableMeta> {
private final PCollectionView<IdWrapper> idView;
private DataReplicator(PCollectionView<IdWrapper> idView) {
this.idView = idView;
}
// TableMeta {string: sourceTable, string: destTable}
#ProcessElement
public void processElement(#Element TableMeta tableMeta, ProcessContext ctx) {
long id = ctx.sideInput(idView).getValue();
// builds read endpoint which depends on table meta
// updates entities
// stores entities using another endpoint
idView
.getPipeline()
.apply(createReadEndpoint(tableMeta).read())
.apply(ParDo.of(new SomeFunction(tableMeta, id)))
.apply(createWriteEndpoint(tableMeta).insert());
ctx.output(tableMetadata);
}
}
I expect it to replicate data specified by TableMeta but I can't use pipeline within DoFn object because it can't be serialized/deserialized.
Is there any way to implement the intended behavior?

Kafka Streams persistent store error: the state store, may have migrated to another instance

I am using Kafka Streams with Spring Boot. In my use case when I receive customer event from other microservice I need to store in customer materialized view and when I receive order event, I need to join customer and order then store in customer-order materialized view. To achieve this I created persistent key-value store customer-store and updating this when a new event comes.
StoreBuilder customerStateStore = Stores.keyValueStoreBuilder(Stores.persistentKeyValueStore("customer"),Serdes.String(), customerSerde).withLoggingEnabled(new HashMap<>());
streamsBuilder.addStateStore(customerStateStore);
KTable<String,Customer> customerKTable=streamsBuilder.table("customer",Consumed.with(Serdes.String(),customerSerde));
customerKTable.foreach(((key, value) -> System.out.println("Customer from Topic: "+value)));
Configured Topology, Streams and started streams object. When I try to access store using ReadOnlyKeyValueStore, I got the following exception, even though I stored some objects few moments ago
streams.start();
ReadOnlyKeyValueStore<String, Customer> customerStore = streams.store("customer", QueryableStoreTypes.keyValueStore());
System.out.println("customerStore.approximateNumEntries()-> " + customerStore.approximateNumEntries());
Code uploaded to Github for reference. Appreciate your help.
Exception:
org.apache.kafka.streams.errors.InvalidStateStoreException: the state store, customer, may have migrated to another instance.
at org.apache.kafka.streams.state.internals.QueryableStoreProvider.getStore(QueryableStoreProvider.java:60)
at org.apache.kafka.streams.KafkaStreams.store(KafkaStreams.java:1043)
at com.kafkastream.service.EventsListener.main(EventsListener.java:94)
The state store needs some time to be prepared usually. The simplest approach is like below. (code from the official document)
public static <T> T waitUntilStoreIsQueryable(final String storeName,
final QueryableStoreType<T> queryableStoreType,
final KafkaStreams streams) throws InterruptedException {
while (true) {
try {
return streams.store(storeName, queryableStoreType);
} catch (InvalidStateStoreException ignored) {
// store not yet ready for querying
Thread.sleep(100);
}
}
}
You can find additional info in the document.
https://docs.confluent.io/current/streams/faq.html#interactive-queries

getting statestore data from called function in kafka streams

In Kafka Streams' Processor API, can I pass processor context from init() as follows to other function and get the context back with state store in process()?
public void init(ProcessorContext context) {
this.context = context;
String resourceName = "config.properties";
ClassLoader loader = Thread.currentThread().getContextClassLoader();
Properties props = new Properties();
try(InputStream resourceStream = loader.getResourceAsStream(resourceName)) {
props.load(resourceStream);
}
catch(IOException e){
e.printStackTrace();
}
dataSplitter.timerMessageSource(props, context);//can I pass context like this?
this.context.schedule(1000);
// retrieve the key-value store named "patient"
kvStore = (KeyValueStore<String, PatientDataSummary>) this.context.getStateStore("patient");
//want to get the value of statestore filled by the called function timerMessageSource(), as the data to be put in statestore is getting generated in timerMessageSource()
//is there any way I can get that by using context or so
}
The usage of ProcessorContext is somewhat limited and you cannot call each method is provides at arbitrary times. Thus, it depend how you use it -- in general, you can pass it around as you wish (it will always be the same object throughout the live time of the processor).
If I understand your question correctly, you register a punctuation and use your dataSplitter within the punctuation callback and want to modify the store. That is absolutely possible -- you can either put the store into a class member similar to what you do with the context or use the context object to get the store within the punctuate callback.