How to add a StateStore using StateStoreBuilder in a Spring Cloud Stream Kafka Streams application - apache-kafka

The native Kafka API allows us to create and add a state store using the StreamsBuilder:
final StreamsBuilder builder = new StreamsBuilder();
...
final StoreBuilder<WindowStore<String, Long>> dedupStoreBuilder = Stores.windowStoreBuilder(
Stores.persistentWindowStore(storeName,
retentionPeriod,
windowSize,
false
),
Serdes.String(),
Serdes.Long());
builder.addStateStore(dedupStoreBuilder);
I would like to do the same using Spring Cloud Streams but can't figure out the way to access the StreamsBuilder to add the store.
I've tried to retrieve the StreamsBuilderFactoryBean as stated in the doc, hoping the I could get the StreamsBuilder object from it, but the bean doesn't seem to be available:
#EnableBinding(KafkaStreamsProcessor::class)
class FraudKafkaStreamsConfiguration(private val context: ApplicationContext) {
#StreamListener
#SendTo("output")
fun process(#Input("input") input: KStream<String, TransferEmitted>): KStream<String, TransferEmitted> {
val streamsBuilderFactoryBean = context.getBean("&stream-builder-process", StreamsBuilderFactoryBean::class.java)
...
return xxx
}
}
Caused by:
org.springframework.beans.factory.NoSuchBeanDefinitionException: No
bean named 'stream-builder-process' available
In any case I'm not even sure that's the right way to do it. So, how can we programatically create a StateStore?

I didn't see the documented procedure because of my Scs version (Fishtown SR3), but the good news is that it's possible to create the State Store declaratively since Germantown:
const val DEDUP_STORE = "dedup-store"
#EnableBinding(KafkaStreamsProcessor::class)
class FraudKafkaStreamsConfiguration {
#KafkaStreamsStateStore(name = DEDUP_STORE, type = KafkaStreamsStateStoreProperties.StoreType.KEYVALUE)
#StreamListener
#SendTo("output")
fun process(#Input("input") input: KStream<String, TransferEmitted>): KStream<String, TransferEmitted> {
return input.transform(TransformerSupplier { DeduplicationTransformer() }, DEDUP_STORE)
}
}

Related

KStream.processValues() - getting a null state store from FixedKeyProcessor

I have the following topology which uses processValues() method to combine streams DSL with Processor Api. I'm adding a state store here.
KStream<String, SecurityCommand> securityCommands =
builder.stream(
"security-command",
Consumed.with(Serdes.String(), JsonSerdes.securityCommand()));
StoreBuilder<KeyValueStore<String, UserAccountSnapshot>> storeBuilder =
Stores.keyValueStoreBuilder(
Stores.persistentKeyValueStore("user-account-snapshot"),
Serdes.String(),
JsonSerdes.userAccountSnapshot());
builder.addStateStore(storeBuilder);
securityCommands.processValues(() -> new SecurityCommandProcessor(), Named.as("security-command-processor"), "user-account-snapshot")
.processValues(() -> new UserAccountSnapshotUpdater(), Named.as("user-snapshot-updater"), "user-account-snapshot")
.to("security-event", Produced.with(
Serdes.String(),
JsonSerdes.userAccountEvent()));
The SecurityCommandProcessor code follows:
class SecurityCommandProcessor implements FixedKeyProcessor<String, SecurityCommand, UserAccountEvent> {
private KeyValueStore<String, UserAccountSnapshot> kvStore;
private FixedKeyProcessorContext context;
#Override
public void init(FixedKeyProcessorContext context) {
this.kvStore = (KeyValueStore<String, UserAccountSnapshot>) context.getStateStore("user-account-snapshot");
this.context = context;
}
...
}
The problem is that context.getStateStore("user-account-snapshot") returns null.
I tried doing nearly the same code, by using the obsolete transformValues() and I'm able to get the state store. The problem is with processValues(). Am I doing anything wrong?
The issue is that you're using a lambda instance for the FixedKeyProcessorSupplier. When the processor needs access to a state store, you'll need to override the stores method, which returns null when it's not overridden. The FixedKeyProcessorSupplier extends the ConnectedStoreProvider interface.
So you'll need to provide a concrete instance of the processor supplier.
Let me know how it goes.
HTH,
Bill

Kafka Streams: Define multiple Kafka Streams using Spring Cloud Stream for each set of topics

I am trying to do a simple POC with Kafka Streams. However I am getting exception while starting the application. I am using Spring-Kafka, Kafka-Streams 2.5.1 with Spring boot 2.3.5
Kafka stream configuration
#Configuration
public class KafkaStreamsConfig {
private static final Logger log = LoggerFactory.getLogger(KafkaStreamsConfig.class);
#Bean
public Function<KStream<String, String>, KStream<String, String>> processAAA() {
return input -> input.peek((key, value) -> log
.info("AAA Cloud Stream Kafka Stream processing : {}", input.toString().length()));
}
#Bean
public Function<KStream<String, String>, KStream<String, String>> processBBB() {
return input -> input.peek((key, value) -> log
.info("BBB Cloud Stream Kafka Stream processing : {}", input.toString().length()));
}
#Bean
public Function<KStream<String, String>, KStream<String, String>> processCCC() {
return input -> input.peek((key, value) -> log
.info("CCC Cloud Stream Kafka Stream processing : {}", input.toString().length()));
}
/*
#Bean
public KafkaStreams kafkaStreams(KafkaProperties kafkaProperties) {
final Properties props = new Properties();
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, kafkaProperties.getBootstrapServers());
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "groupId-1"););
props.put(StreamsConfig.PROCESSING_GUARANTEE_CONFIG, StreamsConfig.EXACTLY_ONCE);
props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, JsonSerde.class);
props.put(JsonDeserializer.VALUE_DEFAULT_TYPE, JsonNode.class);
final KafkaStreams kafkaStreams = new KafkaStreams(kafkaStreamTopology(), props);
kafkaStreams.start();
return kafkaStreams;
}
#Bean
public Topology kafkaStreamTopology() {
final StreamsBuilder streamsBuilder = new StreamsBuilder();
streamsBuilder.stream(Arrays.asList(AAATOPIC, BBBInputTOPIC, CCCInputTOPIC));
return streamsBuilder.build();
} */
}
application.yaml configured is like below. The idea is that I have 3 input and 3 output topics.
The component takes input from input topic and gives output to outputtopic.
spring:
application.name: consumerapp-1
cloud:
function:
definition: processAAA;processBBB;processCCC
stream:
kafka.binder:
brokers: 127.0.0.1:9092
autoCreateTopics: true
auto-add-partitions: true
kafka.streams.binder:
configuration:
commit.interval.ms: 1000
default.key.serde: org.apache.kafka.common.serialization.Serdes$StringSerde
default.value.serde: org.apache.kafka.common.serialization.Serdes$StringSerde
bindings:
processAAA-in-0:
destination: aaaInputTopic
processAAA-out-0:
destination: aaaOutputTopic
processBBB-in-0:
destination: bbbInputTopic
processBBB-out-0:
destination: bbbOutputTopic
processCCC-in-0:
destination: cccInputTopic
processCCC-out-0:
destination: cccOutputTopic
Exception thrown is
Caused by: java.lang.IllegalArgumentException: Trying to prepareConsumerBinding public abstract void org.apache.kafka.streams.kstream.KStream.to(java.lang.String,org.apache.kafka.streams.kstream.Produced) but no delegate has been set.
at org.springframework.util.Assert.notNull(Assert.java:201)
at org.springframework.cloud.stream.binder.kafka.streams.KStreamBoundElementFactory$KStreamWrapperHandler.invoke(KStreamBoundElementFactory.java:134)
Can anyone help me with Kafka Streams Spring-Kafka code samples for processing with multiple input and output topics.
Updates: 21-Jan-2021
After removing all kafkaStreams and kafkaStreamsTopology beans configuration iam getting below message in an infinite loop. The messages consumption is still not working. I have checked the subscription in application.yaml with the #Bean Function definitions. they all look ok to me but still I get this cross wiring error. I have replaced the application.properties with application.yaml above
[consumerapp-1-75eec5e5-2772-4999-acf2-e9ef1e69f100-StreamThread-1] [Consumer clientId=consumerapp-1-75eec5e5-2772-4999-acf2-e9ef1e69f100-StreamThread-1-consumer, groupId=consumerapp-1] We received an assignment [cccParserTopic-0] that doesn't match our current subscription Subscribe(bbbParserTopic); it is likely that the subscription has changed since we joined the group. Will try re-join the group with current subscription
2021-01-21 14:12:43,336 WARN org.apache.kafka.clients.consumer.internals.ConsumerCoordinator [consumerapp-1-75eec5e5-2772-4999-acf2-e9ef1e69f100-StreamThread-1] [Consumer clientId=consumerapp-1-75eec5e5-2772-4999-acf2-e9ef1e69f100-StreamThread-1-consumer, groupId=consumerapp-1] We received an assignment [cccParserTopic-0] that doesn't match our current subscription Subscribe(bbbParserTopic); it is likely that the subscription has changed since we joined the group. Will try re-join the group with current subscription
I have managed to solve the problem. I am writing this for the benefit of others.
If you want to include multiple streams in your single app jar then the key is in defining multiple application Ids that is one per each of your streams. I knew this all along but I was not aware on how to define it. Finally the answer is something I have managed to dig out after reading the SCSt documentation. Below is how the application.yaml can be defined.
application.yaml is like below
spring:
application.name: kafkaMultiStreamConsumer
cloud:
function:
definition: processAAA; processBBB; processCCC --> // needed for Imperative #StreamListener
stream:
kafka:
binder:
brokers: 127.0.0.1:9092
min-partition-count: 3
replication-factor: 2
transaction:
transaction-id-prefix: transaction-id-2000
autoCreateTopics: true
auto-add-partitions: true
streams:
binder:
functions:
// needed for functional
processBBB:
application-id: SampleBBBapplication
processAAA:
application-id: SampleAAAapplication
processCCC:
application-id: SampleCCCapplication
configuration:
commit.interval.ms: 1000
default.key.serde: org.apache.kafka.common.serialization.Serdes$StringSerde
default.value.serde: org.apache.kafka.common.serialization.Serdes$StringSerde
bindings:
// Below is for Imperative Style programming using
// the annotation namely #StreamListener, #SendTo in .java class
inputAAA:
destination: aaaInputTopic
outputAAA:
destination: aaaOutputTopic
inputBBB:
destination: bbbInputTopic
outputBBB:
destination: bbbOutputTopic
inputCCC:
destination: cccInputTopic
outputCCC:
destination: cccOutputTopic
// Functional Style programming using Function<KStream...> use either one of them
// as both are not required. If you use both its ok but only one of them works
// from what i have seen #StreamListener is triggered always.
// Below is from functional style
processAAA-in-0:
destination: aaaInputTopic
group: processAAA-group
processAAA-out-0:
destination: aaaOutputTopic
group: processAAA-group
processBBB-in-0:
destination: bbbInputTopic
group: processBBB-group
processBBB-out-0:
destination: bbbOutputTopic
group: processBBB-group
processCCC-in-0:
destination: cccInputTopic
group: processCCC-group
processCCC-out-0:
destination: cccOutputTopic
group: processCCC-group
Once above is defined we now need to define individual java classes where the Stream processing logic is implemented.
Your Java class can be something like below. Create similarly for other 2 or N streams as per your requirement. One example is like below : AAASampleStreamTask.java
#Component
#EnableBinding(AAASampleChannel.class) // One Channel interface corresponding to in-topic and out-topic
public class AAASampleStreamTask {
private static final Logger log = LoggerFactory.getLogger(AAASampleStreamTask.class);
#StreamListener(AAASampleChannel.INPUT)
#SendTo(AAASampleChannel.OUTPUT)
public KStream<String, String> processAAA(KStream<String, String> input) {
input.foreach((key, value) -> log.info("Annotation AAA *Sample* Cloud Stream Kafka Stream processing {}", String.valueOf(System.currentTimeMillis())));
...
// do other business logic
...
return input;
}
/**
* Use above or below. Below style is latest startting from ScSt 3.0 if iam not
* wrong. 2 different styles of consuming Kafka Streams using SCSt. If we have
* both then above gets priority as per my observation
*/
/*
#Bean
public Function<KStream<String, String>, KStream<String, String>> processAAA() {
return input -> input.peek((key, value) -> log.info(
"Functional AAA *Sample* Cloud Stream Kafka Stream processing : {}", String.valueOf(System.currentTimeMillis())));
...
// do other business logic
...
}
*/
}
The Channel is required if you want to go with Imperative style programming not for functional.
AAASampleChannel.java
public interface AAASampleChannel {
String INPUT = "inputAAA";
String OUTPUT = "outputAAA";
#Input(INPUT)
KStream<String, String> inputAAA();
#Output(OUTPUT)
KStream<String, String> outputAAA();
}
Looks like you are mixing Spring Cloud Stream and Spring Kafka in the application. When using the binder, you don't need to directly define components required by Spring Kafka such as KafkaStreams and Topology, rather they are created by SCSt implicitly. Can you remove the following beans and try again?
#Bean
public KafkaStreams kafkaStreams(KafkaProperties kafkaProperties) {
and
#Bean
public Topology kafkaStreamTopology() {
If you are still facing issues, please share a small sample that can be reproducible, that way we can triage it further.

Kafka Streams (Scala): Invalid topology: StateStore is not added yet

I have a topology where I have a stream A.
From that stream A, I create a WindowedStore S.
A --> [S]
Then I want to make the objects in A transformed depending on data on S, and also these transformed objects to arrive to the WindowStore logic(via transformValues).
For that, I create a Transformer for that, creating a Stream A', and making the windowing aware of it (i.e. now, S will be made from A', not from A).
A -> A' --> [S]
^__read__|
But I cannot do that, because when I create the Topology, an exception is thrown:
Caused by: org.apache.kafka.streams.errors.TopologyException: Invalid topology: StateStore storeName is not added yet.
Is there a way to work this around? Is this a limitation?
Code example:
// A
val sessionElementsStream: KStream[K, SessionElement] = ...
// A'
val sessionElementsTransformed : KStream[K, SessionElementTransformed] = {
// Here we use the sessionStoreName - but it is not added yet to the Topology
sessionElementsStream.
transformValues(sessionElementTransformerSupplier, sessionStoreName)
}
val sessionElementsWindowedStream: SessionWindowedKStream[K, SessionElementTransformed] = {
sessionElementsTransformed.
groupByKey(sessionElementTransformedGroupedBy).
windowedBy(sessionWindows)
}
val sessionStore : KTable[Windowed[K], List[WindowedSession]] =
sessionElementsWindowedStream.aggregate(
initializer = List.empty[WindowedSession])(
aggregator = anAggregator, merger = aMerger)(materialized = getMaterializedMUPKSessionStore(sessionStoreName))
The original problem, is that depending on previous sessions' values, I would like to change sessions after it. But if I do this in a transformer after the sessioning, these transformed sessions can be changed and sent downstream - but they won't reflect their new state in S - so further requests to the store will have the old values.
Kafka Streams 2.1, Scala 2.12.4.
Co-partitioned topics.
UPDATE
There is a way to do this within the DSL, using an extra topic:
Sent A' to this topic
Create builder.stream from this topic and build store from it.
Define Store before you define the transformation (so the transformation step can use the Store, because it is already defined before).
However, it sounds cumbersome to have to use an extra topic here. Is there no other, simpler way to solve it?
But I cannot do that, because when I create the Topology, an exception is thrown:
Caused by: org.apache.kafka.streams.errors.TopologyException: Invalid topology: StateStore storeName is not added yet.
It looks like you simply forgot to literally "add" the state store to your processing topology, and then attach ("make available") the state store to your Transformer.
Here's a code snippet that demonstrates this (sorry, in Java).
Adding the state store to your topology:
final StreamsBuilder builder = new StreamsBuilder();
final StoreBuilder<KeyValueStore<String, Long> myStateStore =
Stores.keyValueStoreBuilder(
Stores.persistentKeyValueStore("my-state-store-name"),
Serdes.String(),
Serdes.Long())
.withCachingEnabled();
builder.addStateStore(myStateStore);
Attaching the state store to your Transformer:
final KStream<String, Double> stream = builder.stream("your-input-topic", Consumed.with(Serdes.String(), Serdes.Double()));
final KStream<String, Long> transformedStream =
stream.transform(new YourTransformer(myStateStore.name()), myStateStore.name());
And of course your Transformer must integrate the state store, with code like the following (this Transformer reads <String, Double> and writes String, Long>).
class MyTransformer implements TransformerSupplier<String, Double, KeyValue<String, Long>> {
private final String myStateStoreName;
MyTransformer(final String myStateStoreName) {
this.myStateStoreName = myStateStoreName;
}
#Override
public Transformer<String, Double, KeyValue<String, Long>> get() {
return new Transformer<String, Double, KeyValue<String, Long>>() {
private KeyValueStore<String, Long> myStateStore;
private ProcessorContext context;
#Override
public void init(final ProcessorContext context) {
myStateStore = (KeyValueStore<String, Long>) context.getStateStore(myStateStoreName);
}
// ...
}
}

Kafka Streams dynamic routing (ProducerInterceptor might be a solution?)

I'm working with Apache Kafka and I've been experimenting with the Kafka Streams functionality.
What I'm trying to achieve is very simple, at least in words and it can be achieved easily with the regular plain Consumer/Producer approach:
Read a from a dynamic list of topics
Do some processing on the message
Push the message to another topic which name is computed based on the message content
Initially I thought I could create a custom Sink or inject some kind of endpoint resolver in order to programmatically define the topic name for each single message, although ultimately couldn't find any way to do that.
So I dug into the code and found the ProducerInterceptor class that is (quoting from the JavaDoc):
A plugin interface that allows you to intercept (and possibly mutate)
the records received by the producer before they are published to the
Kafka cluster.
And it's onSend method:
This is called from KafkaProducer.send(ProducerRecord) and
KafkaProducer.send(ProducerRecord, Callback) methods, before key and
value get serialized and partition is assigned (if partition is not
specified in ProducerRecord).
It seemed like the perfect solution for me as I can effectively return a new ProducerRecord with the topic name I want.
Although apparently there's a bug (I've opened an issue on their JIRA: KAFKA-4691) and that method is called when the key and value have already been serialized.
Bummer as I don't think doing an additional deserialization at this point is acceptable.
My question to you more experienced and knowledgeable users would be your input and ideas and any kind of suggestions on how would be an efficient and elegant way of achieving it.
Thanks in advance for your help/comments/suggestions/ideas.
Below are some code snippets of what I've tried:
public static void main(String[] args) throws Exception {
StreamsConfig streamingConfig = new StreamsConfig(getProperties());
StringDeserializer stringDeserializer = new StringDeserializer();
StringSerializer stringSerializer = new StringSerializer();
MyObjectSerializer myObjectSerializer = new MyObjectSerializer();
TopologyBuilder topologyBuilder = new TopologyBuilder();
topologyBuilder.addSource("SOURCE", stringDeserializer, myObjectSerializer, Pattern.compile("input-.*"));
.addProcessor("PROCESS", MyCustomProcessor::new, "SOURCE");
System.out.println("Starting PurchaseProcessor Example");
KafkaStreams streaming = new KafkaStreams(topologyBuilder, streamingConfig);
streaming.start();
System.out.println("Now started PurchaseProcessor Example");
}
private static Properties getProperties() {
Properties props = new Properties();
.....
.....
props.put(StreamsConfig.producerPrefix(ProducerConfig.INTERCEPTOR_CLASSES_CONFIG), "com.test.kafka.streams.OutputTopicRouterInterceptor");
return props;
}
OutputTopicRouterInterceptor onSend implementation:
#Override
public ProducerRecord<String, MyObject> onSend(ProducerRecord<String, MyObject> record) {
MyObject obj = record.value();
String topic = computeTopicName(obj);
ProducerRecord<String, MyObject> newRecord = new ProducerRecord<String, MyObject>(topic, record.partition(), record.timestamp(), record.key(), obj);
return newRecord;
}

Replacing RocksDB with In-memory state store in Kafka Streams

I'm using Kafka Streams 0.10.1.1 release.
the RocksDB implementation for state store can't handle our 50k/msg rate so I want to change the state store to be the in-memory one. This should be possible according to the docs: http://docs.confluent.io/3.1.0/streams/architecture.html#state
However, when I implement this:
val stateStore = Stores.create(stateStoreName).withStringKeys().withStringKeys().inMemory().build()
val procSuppl: KStreamAggregate = ... // I'll spare the implementation details
streamBuilder.addSource(
"mysource",
new StringDeserializer(),
new StringDeserializer(),
"input_topic"
).addProcessor("proc", procSuppl, "mysource").addStateStore(stateStore, "proc")
I end up with this error in runtime:
Caused by: java.lang.ClassCastException: org.apache.kafka.streams.state.internals.MeteredKeyValueStore cannot be cast to org.apache.kafka.streams.state.internals.CachedStateStore
2017-01-23T13:19:11.830674020Z at org.apache.kafka.streams.kstream.internals.KStreamAggregate$KStreamAggregateProcessor.init(KStreamAggregate.java:62)
The implementation of the above method is:
public void init(ProcessorContext context) {
super.init(context);
store = (KeyValueStore<K, T>) context.getStateStore(storeName);
((CachedStateStore) store).setFlushListener(new ForwardingCacheFlushListener<K, V>(context, sendOldValues));
}
Why is it trying to cast the state store to a CachedStateStore instance? How can I implement a simple in-memory state store which should be possible according to docs?
Thanks
In order to create an in-memory state store, one needs to create a store supplier (using Stores factory object):
val storeSupplier = Stores.inMemoryKeyValueStore("in-mem")
Then you need to use the store supplier when materializing a KTable:
val wordCounts = builder
.stream[String, String]("streams-plaintext-input")
.flatMapValues(textLine => textLine.toLowerCase.split("\\W+"))
.groupBy((_, word) => word)
.count()(Materialized.as(storeSupplier))
Obtain the queryable store:
val qStore = streams.store(
wordCounts.queryableStoreName,
QueryableStoreTypes.keyValueStore[String, Long])