Replacing RocksDB with In-memory state store in Kafka Streams - apache-kafka

I'm using Kafka Streams 0.10.1.1 release.
the RocksDB implementation for state store can't handle our 50k/msg rate so I want to change the state store to be the in-memory one. This should be possible according to the docs: http://docs.confluent.io/3.1.0/streams/architecture.html#state
However, when I implement this:
val stateStore = Stores.create(stateStoreName).withStringKeys().withStringKeys().inMemory().build()
val procSuppl: KStreamAggregate = ... // I'll spare the implementation details
streamBuilder.addSource(
"mysource",
new StringDeserializer(),
new StringDeserializer(),
"input_topic"
).addProcessor("proc", procSuppl, "mysource").addStateStore(stateStore, "proc")
I end up with this error in runtime:
Caused by: java.lang.ClassCastException: org.apache.kafka.streams.state.internals.MeteredKeyValueStore cannot be cast to org.apache.kafka.streams.state.internals.CachedStateStore
2017-01-23T13:19:11.830674020Z at org.apache.kafka.streams.kstream.internals.KStreamAggregate$KStreamAggregateProcessor.init(KStreamAggregate.java:62)
The implementation of the above method is:
public void init(ProcessorContext context) {
super.init(context);
store = (KeyValueStore<K, T>) context.getStateStore(storeName);
((CachedStateStore) store).setFlushListener(new ForwardingCacheFlushListener<K, V>(context, sendOldValues));
}
Why is it trying to cast the state store to a CachedStateStore instance? How can I implement a simple in-memory state store which should be possible according to docs?
Thanks

In order to create an in-memory state store, one needs to create a store supplier (using Stores factory object):
val storeSupplier = Stores.inMemoryKeyValueStore("in-mem")
Then you need to use the store supplier when materializing a KTable:
val wordCounts = builder
.stream[String, String]("streams-plaintext-input")
.flatMapValues(textLine => textLine.toLowerCase.split("\\W+"))
.groupBy((_, word) => word)
.count()(Materialized.as(storeSupplier))
Obtain the queryable store:
val qStore = streams.store(
wordCounts.queryableStoreName,
QueryableStoreTypes.keyValueStore[String, Long])

Related

Can the state stores in Kafka be shared across streams?

We have a scenario where a statestore having some values from one kstream needs to be accessed in another kstream, is there any way to achieve this?
They can be accessed with Interactive Queries.
Between applications or instances of the same application, you need to use RPC calls such as adding an HTTP or gRPC server.
https://docs.confluent.io/platform/current/streams/developer-guide/interactive-queries.html
You can attach the same state store to multiple processors if you use the Processor API, but also if you use the Processor API Integration in the DSL.
There are two ways to do that (see javadocs). You can either manually add the store to the processors, like:
// create store
StoreBuilder<KeyValueStore<String,String>> keyValueStoreBuilder =
Stores.keyValueStoreBuilder(Stores.persistentKeyValueStore("myProcessorState"),
Serdes.String(),
Serdes.String());
// add store
builder.addStateStore(keyValueStoreBuilder);
KStream outputStream = inputStream.processor(new ProcessorSupplier() {
public Processor get() {
return new MyProcessor();
}
}, "myProcessorState");
or you can implement stores() on the passed in ProcessorSupplier:
class MyProcessorSupplier implements ProcessorSupplier {
// supply processor
Processor get() {
return new MyProcessor();
}
// provide store(s) that will be added and connected to the associated processor
// the store name from the builder ("myProcessorState") is used to access the store later via the ProcessorContext
Set<StoreBuilder> stores() {
StoreBuilder<KeyValueStore<String, String>> keyValueStoreBuilder =
Stores.keyValueStoreBuilder(Stores.persistentKeyValueStore("myProcessorState"),
Serdes.String(),
Serdes.String());
return Collections.singleton(keyValueStoreBuilder);
}
}
These are examples for KStream#process(), but it works similarly for the family of KStream#*transform*() methods.

Adding data to state store for stateful processing and fault tolerance

I have a microservice that perform some stateful processing. The application construct a KStream from an input topic, do some stateful processing then write data into the output topic.
I will be running 3 of this applications in the same group. There are 3 parameters that I need to store in the event when the microservice goes down, the microservice that takes over can query the shared statestore and continue where the crashed service left off.
I am thinking of pushing these 3 parameters into a statestore and query the data when the other microservice takes over. From my research, I have seen a lot of example when people perform event counting using state store but that's not exactly what I want, does anyone know an example or what is the right approach for this problem?
So you want to do 2 things:
a. the service going down have to store the parameters:
If you want to do it in a straightforward way than all you have to do is to write a message in the topic associated with the state store (the one you are reading with a KTable). Use the Kafka Producer API or a KStream (could be kTable.toStream()) to do it and that's it.
Otherwise you could create manually a state store:
// take these serde as just an example
Serde<String> keySerde = Serdes.String();
Serde<String> valueSerde = Serdes.String();
KeyValueBytesStoreSupplier storeSupplier = inMemoryKeyValueStore(stateStoreName);
streamsBuilder.addStateStore(Stores.keyValueStoreBuilder(storeSupplier, keySerde, valueSerde));
then use it in a transformer or processor to add items to it; you'll have to declare this in the transformer/processor:
// depending on the serde above you might have something else then String
private KeyValueStore<String, String> stateStore;
and initialize the stateStore variable:
#Override
public void init(ProcessorContext context) {
stateStore = (KeyValueStore<String, String>) context.getStateStore(stateStoreName);
}
and later use the stateStore variable:
#Override
public KeyValue<String, String> transform(String key, String value) {
// using stateStore among other actions you might take here
stateStore.put(key, processedValue);
}
b. read the parameters in the service taking over:
You could do it with a Kafka consumer but with Kafka Streams you first have to make the store available; the easiest way to do it is by creating a KTable; then you have to get the queryable store name that is automatically created with the KTable; then you have to actually get access to the store; then you extract a record value from the store (i.e. a parameter value by its key).
// this example is a modified copy of KTable javadocs example
final StreamsBuilder streamsBuilder = new StreamsBuilder();
// Creating a KTable over the topic containing your parameters a store shall automatically be created.
//
// The serde for your MyParametersClassType could be
// new org.springframework.kafka.support.serializer.JsonSerde(MyParametersClassType.class)
// though further configurations might be necessary here - e.g. setting the trusted packages for the ObjectMapper behind JsonSerde.
//
// If the parameter-value class is a String then you could use Serdes.String() instead of a MyParametersClassType serde.
final KTable paramsTable = streamsBuilder.table("parametersTopicName", Consumed.with(Serdes.String(), <<your InstanceOfMyParametersClassType serde>>));
...
// see the example from KafkaStreams javadocs for more KafkaStreams related details
final KafkaStreams streams = ...;
streams.start()
...
// get the queryable store name that is automatically created with the KTable
final String queryableStoreName = paramsTable.queryableStoreName();
// get access to the store
ReadOnlyKeyValueStore view = streams.store(queryableStoreName, QueryableStoreTypes.timestampedKeyValueStore());
// extract a record value from the store
InstanceOfMyParametersClassType parameter = view.get(key);

How to add a StateStore using StateStoreBuilder in a Spring Cloud Stream Kafka Streams application

The native Kafka API allows us to create and add a state store using the StreamsBuilder:
final StreamsBuilder builder = new StreamsBuilder();
...
final StoreBuilder<WindowStore<String, Long>> dedupStoreBuilder = Stores.windowStoreBuilder(
Stores.persistentWindowStore(storeName,
retentionPeriod,
windowSize,
false
),
Serdes.String(),
Serdes.Long());
builder.addStateStore(dedupStoreBuilder);
I would like to do the same using Spring Cloud Streams but can't figure out the way to access the StreamsBuilder to add the store.
I've tried to retrieve the StreamsBuilderFactoryBean as stated in the doc, hoping the I could get the StreamsBuilder object from it, but the bean doesn't seem to be available:
#EnableBinding(KafkaStreamsProcessor::class)
class FraudKafkaStreamsConfiguration(private val context: ApplicationContext) {
#StreamListener
#SendTo("output")
fun process(#Input("input") input: KStream<String, TransferEmitted>): KStream<String, TransferEmitted> {
val streamsBuilderFactoryBean = context.getBean("&stream-builder-process", StreamsBuilderFactoryBean::class.java)
...
return xxx
}
}
Caused by:
org.springframework.beans.factory.NoSuchBeanDefinitionException: No
bean named 'stream-builder-process' available
In any case I'm not even sure that's the right way to do it. So, how can we programatically create a StateStore?
I didn't see the documented procedure because of my Scs version (Fishtown SR3), but the good news is that it's possible to create the State Store declaratively since Germantown:
const val DEDUP_STORE = "dedup-store"
#EnableBinding(KafkaStreamsProcessor::class)
class FraudKafkaStreamsConfiguration {
#KafkaStreamsStateStore(name = DEDUP_STORE, type = KafkaStreamsStateStoreProperties.StoreType.KEYVALUE)
#StreamListener
#SendTo("output")
fun process(#Input("input") input: KStream<String, TransferEmitted>): KStream<String, TransferEmitted> {
return input.transform(TransformerSupplier { DeduplicationTransformer() }, DEDUP_STORE)
}
}

Kafka Streams (Scala): Invalid topology: StateStore is not added yet

I have a topology where I have a stream A.
From that stream A, I create a WindowedStore S.
A --> [S]
Then I want to make the objects in A transformed depending on data on S, and also these transformed objects to arrive to the WindowStore logic(via transformValues).
For that, I create a Transformer for that, creating a Stream A', and making the windowing aware of it (i.e. now, S will be made from A', not from A).
A -> A' --> [S]
^__read__|
But I cannot do that, because when I create the Topology, an exception is thrown:
Caused by: org.apache.kafka.streams.errors.TopologyException: Invalid topology: StateStore storeName is not added yet.
Is there a way to work this around? Is this a limitation?
Code example:
// A
val sessionElementsStream: KStream[K, SessionElement] = ...
// A'
val sessionElementsTransformed : KStream[K, SessionElementTransformed] = {
// Here we use the sessionStoreName - but it is not added yet to the Topology
sessionElementsStream.
transformValues(sessionElementTransformerSupplier, sessionStoreName)
}
val sessionElementsWindowedStream: SessionWindowedKStream[K, SessionElementTransformed] = {
sessionElementsTransformed.
groupByKey(sessionElementTransformedGroupedBy).
windowedBy(sessionWindows)
}
val sessionStore : KTable[Windowed[K], List[WindowedSession]] =
sessionElementsWindowedStream.aggregate(
initializer = List.empty[WindowedSession])(
aggregator = anAggregator, merger = aMerger)(materialized = getMaterializedMUPKSessionStore(sessionStoreName))
The original problem, is that depending on previous sessions' values, I would like to change sessions after it. But if I do this in a transformer after the sessioning, these transformed sessions can be changed and sent downstream - but they won't reflect their new state in S - so further requests to the store will have the old values.
Kafka Streams 2.1, Scala 2.12.4.
Co-partitioned topics.
UPDATE
There is a way to do this within the DSL, using an extra topic:
Sent A' to this topic
Create builder.stream from this topic and build store from it.
Define Store before you define the transformation (so the transformation step can use the Store, because it is already defined before).
However, it sounds cumbersome to have to use an extra topic here. Is there no other, simpler way to solve it?
But I cannot do that, because when I create the Topology, an exception is thrown:
Caused by: org.apache.kafka.streams.errors.TopologyException: Invalid topology: StateStore storeName is not added yet.
It looks like you simply forgot to literally "add" the state store to your processing topology, and then attach ("make available") the state store to your Transformer.
Here's a code snippet that demonstrates this (sorry, in Java).
Adding the state store to your topology:
final StreamsBuilder builder = new StreamsBuilder();
final StoreBuilder<KeyValueStore<String, Long> myStateStore =
Stores.keyValueStoreBuilder(
Stores.persistentKeyValueStore("my-state-store-name"),
Serdes.String(),
Serdes.Long())
.withCachingEnabled();
builder.addStateStore(myStateStore);
Attaching the state store to your Transformer:
final KStream<String, Double> stream = builder.stream("your-input-topic", Consumed.with(Serdes.String(), Serdes.Double()));
final KStream<String, Long> transformedStream =
stream.transform(new YourTransformer(myStateStore.name()), myStateStore.name());
And of course your Transformer must integrate the state store, with code like the following (this Transformer reads <String, Double> and writes String, Long>).
class MyTransformer implements TransformerSupplier<String, Double, KeyValue<String, Long>> {
private final String myStateStoreName;
MyTransformer(final String myStateStoreName) {
this.myStateStoreName = myStateStoreName;
}
#Override
public Transformer<String, Double, KeyValue<String, Long>> get() {
return new Transformer<String, Double, KeyValue<String, Long>>() {
private KeyValueStore<String, Long> myStateStore;
private ProcessorContext context;
#Override
public void init(final ProcessorContext context) {
myStateStore = (KeyValueStore<String, Long>) context.getStateStore(myStateStoreName);
}
// ...
}
}

Kafka Streams persistent store error: the state store, may have migrated to another instance

I am using Kafka Streams with Spring Boot. In my use case when I receive customer event from other microservice I need to store in customer materialized view and when I receive order event, I need to join customer and order then store in customer-order materialized view. To achieve this I created persistent key-value store customer-store and updating this when a new event comes.
StoreBuilder customerStateStore = Stores.keyValueStoreBuilder(Stores.persistentKeyValueStore("customer"),Serdes.String(), customerSerde).withLoggingEnabled(new HashMap<>());
streamsBuilder.addStateStore(customerStateStore);
KTable<String,Customer> customerKTable=streamsBuilder.table("customer",Consumed.with(Serdes.String(),customerSerde));
customerKTable.foreach(((key, value) -> System.out.println("Customer from Topic: "+value)));
Configured Topology, Streams and started streams object. When I try to access store using ReadOnlyKeyValueStore, I got the following exception, even though I stored some objects few moments ago
streams.start();
ReadOnlyKeyValueStore<String, Customer> customerStore = streams.store("customer", QueryableStoreTypes.keyValueStore());
System.out.println("customerStore.approximateNumEntries()-> " + customerStore.approximateNumEntries());
Code uploaded to Github for reference. Appreciate your help.
Exception:
org.apache.kafka.streams.errors.InvalidStateStoreException: the state store, customer, may have migrated to another instance.
at org.apache.kafka.streams.state.internals.QueryableStoreProvider.getStore(QueryableStoreProvider.java:60)
at org.apache.kafka.streams.KafkaStreams.store(KafkaStreams.java:1043)
at com.kafkastream.service.EventsListener.main(EventsListener.java:94)
The state store needs some time to be prepared usually. The simplest approach is like below. (code from the official document)
public static <T> T waitUntilStoreIsQueryable(final String storeName,
final QueryableStoreType<T> queryableStoreType,
final KafkaStreams streams) throws InterruptedException {
while (true) {
try {
return streams.store(storeName, queryableStoreType);
} catch (InvalidStateStoreException ignored) {
// store not yet ready for querying
Thread.sleep(100);
}
}
}
You can find additional info in the document.
https://docs.confluent.io/current/streams/faq.html#interactive-queries