getting statestore data from called function in kafka streams - apache-kafka

In Kafka Streams' Processor API, can I pass processor context from init() as follows to other function and get the context back with state store in process()?
public void init(ProcessorContext context) {
this.context = context;
String resourceName = "config.properties";
ClassLoader loader = Thread.currentThread().getContextClassLoader();
Properties props = new Properties();
try(InputStream resourceStream = loader.getResourceAsStream(resourceName)) {
props.load(resourceStream);
}
catch(IOException e){
e.printStackTrace();
}
dataSplitter.timerMessageSource(props, context);//can I pass context like this?
this.context.schedule(1000);
// retrieve the key-value store named "patient"
kvStore = (KeyValueStore<String, PatientDataSummary>) this.context.getStateStore("patient");
//want to get the value of statestore filled by the called function timerMessageSource(), as the data to be put in statestore is getting generated in timerMessageSource()
//is there any way I can get that by using context or so
}

The usage of ProcessorContext is somewhat limited and you cannot call each method is provides at arbitrary times. Thus, it depend how you use it -- in general, you can pass it around as you wish (it will always be the same object throughout the live time of the processor).
If I understand your question correctly, you register a punctuation and use your dataSplitter within the punctuation callback and want to modify the store. That is absolutely possible -- you can either put the store into a class member similar to what you do with the context or use the context object to get the store within the punctuate callback.

Related

KStream.processValues() - getting a null state store from FixedKeyProcessor

I have the following topology which uses processValues() method to combine streams DSL with Processor Api. I'm adding a state store here.
KStream<String, SecurityCommand> securityCommands =
builder.stream(
"security-command",
Consumed.with(Serdes.String(), JsonSerdes.securityCommand()));
StoreBuilder<KeyValueStore<String, UserAccountSnapshot>> storeBuilder =
Stores.keyValueStoreBuilder(
Stores.persistentKeyValueStore("user-account-snapshot"),
Serdes.String(),
JsonSerdes.userAccountSnapshot());
builder.addStateStore(storeBuilder);
securityCommands.processValues(() -> new SecurityCommandProcessor(), Named.as("security-command-processor"), "user-account-snapshot")
.processValues(() -> new UserAccountSnapshotUpdater(), Named.as("user-snapshot-updater"), "user-account-snapshot")
.to("security-event", Produced.with(
Serdes.String(),
JsonSerdes.userAccountEvent()));
The SecurityCommandProcessor code follows:
class SecurityCommandProcessor implements FixedKeyProcessor<String, SecurityCommand, UserAccountEvent> {
private KeyValueStore<String, UserAccountSnapshot> kvStore;
private FixedKeyProcessorContext context;
#Override
public void init(FixedKeyProcessorContext context) {
this.kvStore = (KeyValueStore<String, UserAccountSnapshot>) context.getStateStore("user-account-snapshot");
this.context = context;
}
...
}
The problem is that context.getStateStore("user-account-snapshot") returns null.
I tried doing nearly the same code, by using the obsolete transformValues() and I'm able to get the state store. The problem is with processValues(). Am I doing anything wrong?
The issue is that you're using a lambda instance for the FixedKeyProcessorSupplier. When the processor needs access to a state store, you'll need to override the stores method, which returns null when it's not overridden. The FixedKeyProcessorSupplier extends the ConnectedStoreProvider interface.
So you'll need to provide a concrete instance of the processor supplier.
Let me know how it goes.
HTH,
Bill

Can the state stores in Kafka be shared across streams?

We have a scenario where a statestore having some values from one kstream needs to be accessed in another kstream, is there any way to achieve this?
They can be accessed with Interactive Queries.
Between applications or instances of the same application, you need to use RPC calls such as adding an HTTP or gRPC server.
https://docs.confluent.io/platform/current/streams/developer-guide/interactive-queries.html
You can attach the same state store to multiple processors if you use the Processor API, but also if you use the Processor API Integration in the DSL.
There are two ways to do that (see javadocs). You can either manually add the store to the processors, like:
// create store
StoreBuilder<KeyValueStore<String,String>> keyValueStoreBuilder =
Stores.keyValueStoreBuilder(Stores.persistentKeyValueStore("myProcessorState"),
Serdes.String(),
Serdes.String());
// add store
builder.addStateStore(keyValueStoreBuilder);
KStream outputStream = inputStream.processor(new ProcessorSupplier() {
public Processor get() {
return new MyProcessor();
}
}, "myProcessorState");
or you can implement stores() on the passed in ProcessorSupplier:
class MyProcessorSupplier implements ProcessorSupplier {
// supply processor
Processor get() {
return new MyProcessor();
}
// provide store(s) that will be added and connected to the associated processor
// the store name from the builder ("myProcessorState") is used to access the store later via the ProcessorContext
Set<StoreBuilder> stores() {
StoreBuilder<KeyValueStore<String, String>> keyValueStoreBuilder =
Stores.keyValueStoreBuilder(Stores.persistentKeyValueStore("myProcessorState"),
Serdes.String(),
Serdes.String());
return Collections.singleton(keyValueStoreBuilder);
}
}
These are examples for KStream#process(), but it works similarly for the family of KStream#*transform*() methods.

Adding data to state store for stateful processing and fault tolerance

I have a microservice that perform some stateful processing. The application construct a KStream from an input topic, do some stateful processing then write data into the output topic.
I will be running 3 of this applications in the same group. There are 3 parameters that I need to store in the event when the microservice goes down, the microservice that takes over can query the shared statestore and continue where the crashed service left off.
I am thinking of pushing these 3 parameters into a statestore and query the data when the other microservice takes over. From my research, I have seen a lot of example when people perform event counting using state store but that's not exactly what I want, does anyone know an example or what is the right approach for this problem?
So you want to do 2 things:
a. the service going down have to store the parameters:
If you want to do it in a straightforward way than all you have to do is to write a message in the topic associated with the state store (the one you are reading with a KTable). Use the Kafka Producer API or a KStream (could be kTable.toStream()) to do it and that's it.
Otherwise you could create manually a state store:
// take these serde as just an example
Serde<String> keySerde = Serdes.String();
Serde<String> valueSerde = Serdes.String();
KeyValueBytesStoreSupplier storeSupplier = inMemoryKeyValueStore(stateStoreName);
streamsBuilder.addStateStore(Stores.keyValueStoreBuilder(storeSupplier, keySerde, valueSerde));
then use it in a transformer or processor to add items to it; you'll have to declare this in the transformer/processor:
// depending on the serde above you might have something else then String
private KeyValueStore<String, String> stateStore;
and initialize the stateStore variable:
#Override
public void init(ProcessorContext context) {
stateStore = (KeyValueStore<String, String>) context.getStateStore(stateStoreName);
}
and later use the stateStore variable:
#Override
public KeyValue<String, String> transform(String key, String value) {
// using stateStore among other actions you might take here
stateStore.put(key, processedValue);
}
b. read the parameters in the service taking over:
You could do it with a Kafka consumer but with Kafka Streams you first have to make the store available; the easiest way to do it is by creating a KTable; then you have to get the queryable store name that is automatically created with the KTable; then you have to actually get access to the store; then you extract a record value from the store (i.e. a parameter value by its key).
// this example is a modified copy of KTable javadocs example
final StreamsBuilder streamsBuilder = new StreamsBuilder();
// Creating a KTable over the topic containing your parameters a store shall automatically be created.
//
// The serde for your MyParametersClassType could be
// new org.springframework.kafka.support.serializer.JsonSerde(MyParametersClassType.class)
// though further configurations might be necessary here - e.g. setting the trusted packages for the ObjectMapper behind JsonSerde.
//
// If the parameter-value class is a String then you could use Serdes.String() instead of a MyParametersClassType serde.
final KTable paramsTable = streamsBuilder.table("parametersTopicName", Consumed.with(Serdes.String(), <<your InstanceOfMyParametersClassType serde>>));
...
// see the example from KafkaStreams javadocs for more KafkaStreams related details
final KafkaStreams streams = ...;
streams.start()
...
// get the queryable store name that is automatically created with the KTable
final String queryableStoreName = paramsTable.queryableStoreName();
// get access to the store
ReadOnlyKeyValueStore view = streams.store(queryableStoreName, QueryableStoreTypes.timestampedKeyValueStore());
// extract a record value from the store
InstanceOfMyParametersClassType parameter = view.get(key);

How can I access to the acknowledge entity in spring cloud sqs without setting custom argument resolvers?

In order to have access to events like S3EventNotification we need to specify a custom argument resolver in in the QueueMessageHandlerFactory. But since the order in which those argument resolver are evaluated matters it forces me to have a list that has every argument resolver twice. Is it possible to avoid this?
I am trying to read from a queue where events are generated by amazon itself.
In this case I need to set
messageConverter.setStrictContentTypeMatch(false);
as explained here: https://cloud.spring.io/spring-cloud-aws/1.2.x/multi/multi__messaging.html#_consuming_aws_event_messages_with_amazon_sqs
In the method however I needed to use Acknowledge, Visibility, and header method parameters but those were not passed correctly unless I redefine all the possible argument resolver in the configuration.
So to have the following method signature:
#SqsListener(value = "${my-queue-name}", deletionPolicy = NEVER)
public void processRequest(
#Payload S3EventNotification s3EventNotificationRecord,
#Header("ApproximateReceiveCount") final int receiveCount,
Acknowledgment acknowledgment,
Visibility visibility) {
// do some stuff and decide to acknowledge or extend visibility
}
I was forced to write this custom configuration like:
#Configuration
public class AmazonSQSConfig {
private static final String ACKNOWLEDGMENT = "Acknowledgment";
private static final String VISIBILITY = "Visibility";
#Bean
public QueueMessageHandlerFactory queueMessageHandlerFactory() {
QueueMessageHandlerFactory factory = new QueueMessageHandlerFactory();
factory.setArgumentResolvers(initArgumentResolvers());
return factory;
}
private List<HandlerMethodArgumentResolver> initArgumentResolvers() {
MappingJackson2MessageConverter messageConverter = new MappingJackson2MessageConverter();
messageConverter.setStrictContentTypeMatch(false);
return List.of(
new HeaderMethodArgumentResolver(null, null),
new HeadersMethodArgumentResolver(),
new NotificationSubjectArgumentResolver(),
new AcknowledgmentHandlerMethodArgumentResolver(ACKNOWLEDGMENT),
new VisibilityHandlerMethodArgumentResolver(VISIBILITY),
new PayloadArgumentResolver(messageConverter));
}
}
I would expect to have a way to define a custom argument resolver but still have all the argument passed to the method once executed.

Accessing Pipeline within DoFn

I'm writing a pipeline to replicate data from one source to another. Info about data sources is stored in db (BQ). How I can use this data it to build read/write endpoints dynamically?
I tried to pass Pipeline object to my custom DoFn but it can't be serialized. Later I tried to call method getPipeline() on a passed view but it doesn't work as well. -- which is actually expected
I can't know all tables I need to serialize in advance so I have to read all data from db (or any other source).
// builds some random view
PCollectionView<IdWrapper> idView = ...;
// reads tables meta and replicates data per each table
pipeline.apply(getTableMetaEndpont().read())
.apply(ParDo.of(new MyCustomReplicator(idView)).withSideInputs(idView))
private static class MyCustomReplicator extends DoFn<TableMeta, TableMeta> {
private final PCollectionView<IdWrapper> idView;
private DataReplicator(PCollectionView<IdWrapper> idView) {
this.idView = idView;
}
// TableMeta {string: sourceTable, string: destTable}
#ProcessElement
public void processElement(#Element TableMeta tableMeta, ProcessContext ctx) {
long id = ctx.sideInput(idView).getValue();
// builds read endpoint which depends on table meta
// updates entities
// stores entities using another endpoint
idView
.getPipeline()
.apply(createReadEndpoint(tableMeta).read())
.apply(ParDo.of(new SomeFunction(tableMeta, id)))
.apply(createWriteEndpoint(tableMeta).insert());
ctx.output(tableMetadata);
}
}
I expect it to replicate data specified by TableMeta but I can't use pipeline within DoFn object because it can't be serialized/deserialized.
Is there any way to implement the intended behavior?