Kafka Streams state store count - apache-kafka

I have the following topology in Kafka Streams 7.2.2-ccs:
Or in code:
val groupedStream = StreamsBuilder().stream<String, Quote>("quotes").groupByKey()
for (windowSize in windows()) {
groupedStream
.windowedBy(TimeWindows.ofSizeWithNoGrace(windowSize))
.aggregate({ Aggregator() },{ _, quote, aggregator -> aggregator.execute(quote) })
.suppress(Suppressed.untilWindowCloses(Suppressed.BufferConfig.unbounded()))
.toStream()
.to("outputTopic")
}
I am using io.micrometer.core.instrument.binder.kafka.KafkaStreamsMetrics to monitor the application. I have some questions:
Why isn't there any metrics for the unbounded suppressed store? There any many with the label rocksdb_window_state_id, but none for suppressed.
How many rocksdb instances will be created if the input topic has 3 partitions? It seems there is a segment concept for window store, but I couldn't find how many segments per window will be created.
Is there a way to configure RocksDB to flush to disk all keys for windows that were closed? The container is using too much off heap memory, and it keeps growing, and I suspect it's because of that.

Related

Faster building of Kafka Streams state

I have default 7 days of latest streaming data stored in Kafka:
log.retention.hours=168
When deploying new version of Streams application, it takes significant amount of time to process the old data before being able to actually use it.
Are there any options to make it quicker other than reducing the retention period?
What comes to my mind is that state stores shouldn't be persisted to disk until all data is processed.
I'm guessing you have state-stores with changelog topics in your app, and the thing that takes time is restoring the state of the app?
Even if the input topic has retention set, changelog topics have cleanup.policy set to compact by default, so unlimited retention.
What is the size of the keyset? A changelog topic consist of the number of keys you store, you can try reducing this to get a smaller state.
Consider changing the segment.ms and min.cleanable.dirty.ratio to optimize for compacting.
Consider tuning rocksDB config
What I finally came up with is processing only last N hours of original data in my Streams application using filter:
myStream.filter({ (_, value) =>
val calendar = Calendar.getInstance()
calendar.add(Calendar.HOUR, -streamHours)
value.timestamp > calendar.getTimeInMillis
})

Mix of State Stores and Partitions on kafka stream instances

I built a kafka streaming application with a state store. Now I am trying to scale this application. When running the application on three different servers Kafka splits up partitions and state stores randomly.
For example:
Instance1 gets: partition-0, partition-1
Instance2 gets: partition-2, stateStore-repartition-0
Instance3 gets: stateStore-repartition-1, stateStore-repartition-2
I want to assign one stateStore and one partition per instance. What am I doing wrong?
My KafkaStreams Config:
final Properties properties = new Properties();
properties.setProperty(StreamsConfig.APPLICATION_ID_CONFIG, "my-app");
properties.setProperty(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, BOOTSTRAP_SERVERS_CONFIG);
try {
properties.setProperty(StreamsConfig.STATE_DIR_CONFIG,
Files.createTempDirectory(stateStoreName).toAbsolutePath().toString());
} catch (final IOException e) {
// use the default one
}
And my stream is:
stream.groupByKey()
.windowedBy(TimeWindows.of(timeWindowDuration))
.<TradeStats>aggregate(
() -> new TradeStats(),
(k, v, tradestats) -> tradestats.add(v),
Materialized.<String, TradeStats, WindowStore<Bytes, byte[]>>as(stateStoreName)
.withValueSerde(new TradeStatsSerde()))
.toStream();
From what I can see so far (as mentioned in my comment to your question, please share your state store definition), everything is fine and I suspect a slight misconception on your side regarding the question
What am I doing wrong?
Basically, nothing. :-)
For the partition part of your question: They get distributed around the consumers according to the configured assignor (consult https://kafka.apache.org/26/javadoc/index.html?org/apache/kafka/clients/consumer/CooperativeStickyAssignor.html or adjacent interfaces).
For the state store part of your question: May be here lies a little misconception on how (in memory) state stores work: They are usually backed by a Kafka topic which does not reside on your application host(s) but in the Kafka cluster itself. To be more precise, a part of the whole state store lives in the (RocksDB) in-memory key/value store on each of your application hosts, exactly as you showed in the state store assignment in your question. However these are only parts or slices of the complete state store which is maintained in the Kafka cluster.
So in a nutshell: Everything is fine, let Kafka do the assignment job and interfere with this only if you have really special use-cases or good reasons. :-) Kafka also assures correct redundancy and re-balancing of all partitions in case of outages of your application hosts.
If you still want to assign something on your own, the use-case would be interesting for further help.

Kafka streams co-partitioning vs interactive query

I have the following topology:
topology.addSource(WS_CONNECTION_SOURCE, new StringDeserializer(), new WebSocketConnectionEventDeserializer()
, utilService.getTopicByType(TopicType.CONNECTION_EVENTS_TOPIC))
.addProcessor(SESSION_PROCESSOR, WSUserSessionProcessor::new, WS_CONNECTION_SOURCE)
.addStateStore(sessionStoreBuilder, SESSION_PROCESSOR)
.addSink(WS_STATUS_SINK, utilService.getTopicByType(TopicType.ONLINE_STATUS_TOPIC),
stringSerializer, stringSerializer
, SESSION_PROCESSOR)
//WS session routing
.addSource(WS_NOTIFICATIONS_SOURCE, new StringDeserializer(), new StringDeserializer(),
utilService.getTopicByType(TopicType.NOTIFICATION_TOPIC))
.addProcessor(WS_NOTIFICATIONS_ROUTE_PROCESSOR, SessionRoutingEventGenerator::new,
WS_NOTIFICATIONS_SOURCE)
.addSink(WS_NOTIFICATIONS_DELIVERY_SINK, new NodeTopicNameExtractor(), WS_NOTIFICATIONS_ROUTE_PROCESSOR)
.addStateStore(userConnectedNodesStoreBuilder, WS_NOTIFICATIONS_ROUTE_PROCESSOR, SESSION_PROCESSOR);
As you can see there are 2 source topics. State store is built from the first topic and the second flow reads the state store. When I start the topology, I see those stream threads are assigned the same partitions (co-partitioning) of both source topics. I assume this is because the state store is accessed by the second topic flow.
This is functionally working fine. But there is a performance problem. When there is a surge in the volume of input data to the first source topic, which updates state-store, second topic processing is delayed.
For me, the second topic should be processed as fast as possible. Delay in processing the first topic is fine.
I am thinking of the following strategy:
Current configuration:
WS_CONNECTION_SOURCE - 30 partitions
WS_NOTIFICATIONS_SOURCE - 30 partitions
streamThreads: 10
appInstances: 3
New configuration:
WS_CONNECTION_SOURCE - 15 partitions
WS_NOTIFICATIONS_SOURCE - 30 partitions
streamThreads: 10
appInstances: 3
Since there is no co-partitioning, tasks has to use interactive query to access store
The idea is out of 10 threads, 5 threads will only process the second topic which can alleviate the current problem when there is a surge in the first topic.
Here are my questions:
1. Is this strategy correct? To avoid co-partitioning and use interactive query
2. Is there a chance that Kafka will assign 10 partitions of WS_CONNECTION_SOURCE
to one instance since there are 10 threads and one instance won't get any?
3. Is there any better approach to solve the performance problem?
State store and Interactive Query are Kafka Streams abstraction.
To use Interactive Query you have to define state store (using Kafka Streams API) and that enforce you to have same number of partitions, for inputs topics.
I think your solution will not work. Interactive query are for exposing ability to query state store outside the Kafka Streams (not for access within Processor API)
Maybe you can review your SESSION_PROCESSOR source code and extract more work to Process from the other topology and publish result to intermediate topic and then based on that build that state store.
Additionally:
Currently Kafka Streams doesn't support prioritization for input topics. There is KIP about priorities for Source topic: KIP-349. Unfortunately linked Jira ticket was closed as Won't FIX (https://issues.apache.org/jira/browse/KAFKA-6690)

Kafka increase throughput with multiple partition and multiple consumer threads

I am using kafka stream for some application.
Stream flow is like below
kafkaProducer---->StreamerConsumer1->finalCosumer
I have producer which write the data very fast and my StreamConsumer will map each stream with some process and forward the stream to other topic.
in my StreamCosumer map, I added my own mapper function which Actually tries to persist its relevant data like below
public void checkRecord(T1 key, T2 value) {
switch(T1.toString()){
case "key1":
//Get relavant fileds from value and perisit in db
break;
case "key2":
//Get relavant fileds from value and perisit in db
break;
}
}
KStream<String, KafkaStatusRecordWrapper> pDStream[] = myStream.map(this::checkRecord).branch((key, value)-> value.isSuccess(),(key, value)-> !value.isSuccess());
pDStream[0].mapValues(value -> transformer(value)).to("other_topic",Produced.with(stringSerde, stringSerde));
Now my checkRecord record consumer function is single threaded and almost it is taking 300ms(due to some business logic and db persist which I can not avoid) to return.
I can not increase the number of partition as there was some limitation from our infra, and also due to below constraints
More Partitions Requires More Open File Handles
More Partitions May Increase Unavailability
More Partitions May Increase End-to-end Latency
so I am planning to write multi-threaded stream consumer.
But I am concerned about below points.
I need to process record only once
Handing off to another thread will cause problems with offset management.
So how to increase throughput ?
I have enough resource on my consumer, only 40% of its resource are used.
You can set the stream configuration num.stream.threads to configure the number of threads. Maximum value could be the maximum number of partitions. It helps to increase the parallelism of the application instance.
Let's say if your topic has 4 partitions, you can set following:
properties.set("num.stream.threads",4);

Get latest values from a topic on consumer start, then continue normally

We have a Kafka producer that produces keyed messages in a very high frequency to topics whose retention time = 10 hours. These messages are real-time updates and the used key is the ID of the element whose value has changed. So the topic is acting as a changelog and will have many duplicate keys.
Now, what we're trying to achieve is that when a Kafka consumer launches, regardless of the last known state (new consumer, crashed, restart, etc..), it will somehow construct a table with the latest values of all the keys in a topic, and then keeps listening for new updates as normal, keeping the minimum load on Kafka server and letting the consumer do most of the job. We tried many ways and none of them seems the best.
What we tried:
1 changelog topic + 1 compact topic:
The producer sends the same message to both topics wrapped in a transaction to assure successful send.
Consumer launches and requests the latest offset of the changelog topic.
Consumes the compacted topic from beginning to construct the table.
Continues consuming the changelog since the requested offset.
Cons:
Having duplicates in compacted topic is a very high possibility even with setting the log compaction frequency the highest possible.
x2 number of topics on Kakfa server.
KSQL:
With KSQL we either have to rewrite a KTable as a topic so that consumer can see it (Extra topics), or we will need consumers to execute KSQL SELECT using to KSQL Rest Server and query the table (Not as fast and performant as Kafka APIs).
Kafka Consumer API:
Consumer starts and consumes the topic from beginning. This worked perfectly, but the consumer has to consume the 10 hours change log to construct the last values table.
Kafka Streams:
By using KTables as following:
KTable<Integer, MarketData> tableFromTopic = streamsBuilder.table("topic_name", Consumed.with(Serdes.Integer(), customSerde));
KTable<Integer, MarketData> filteredTable = tableFromTopic.filter((key, value) -> keys.contains(value.getRiskFactorId()));
Kafka Streams will create 1 topic on Kafka server per KTable (named {consumer_app_id}-{topic_name}-STATE-STORE-0000000000-changelog), which will result in a huge number of topics since we a big number of consumers.
From what we have tried, it looks like we need to either increase the server load, or the consumer launch time. Isn't there a "perfect" way to achieve what we're trying to do?
Thanks in advance.
By using KTables, Kafka Streams will create 1 topic on Kafka server per KTable, which will result in a huge number of topics since we a big number of consumers.
If you are just reading an existing topic into a KTable (via StreamsBuilder#table()), then no extra topics are being created by Kafka Streams. Same for KSQL.
It would help if you could clarify what exactly you want to do with the KTable(s). Apparently you are doing something that does result in additional topics being created?
1 changelog topic + 1 compact topic:
Why were you thinking about having two separate topics? Normally, changelog topics should always be compacted. And given your use case description, I don't see a reason why it should not be:
Now, what we're trying to achieve is that when a Kafka consumer launches, regardless of the last known state (new consumer, crashed, restart, etc..), it will somehow construct a table with the latest values of all the keys in a topic, and then keeps listening for new updates as normal [...]
Hence compaction would be very useful for your use case. It would also prevent this problem you described:
Consumer starts and consumes the topic from beginning. This worked perfectly, but the consumer has to consume the 10 hours change log to construct the last values table.
Note that, to reconstruct the latest table values, all three of Kafka Streams, KSQL, and the Kafka Consumer must read the table's underlying topic completely (from beginning to end). If that topic is NOT compacted, this might indeed take a long time depending on the data volume, topic retention settings, etc.
From what we have tried, it looks like we need to either increase the server load, or the consumer launch time. Isn't there a "perfect" way to achieve what we're trying to do?
Without knowing more about your use case, particularly what you want to do with the KTable(s) once they are populated, my answer would be:
Make sure the "changelog topic" is also compacted.
Try KSQL first. If this doesn't satisfy your needs, try Kafka Streams. If this doesn't satisfy your needs, try the Kafka Consumer.
For example, I wouldn't use the Kafka Consumer if it is supposed to do any stateful processing with the "table" data, because the Kafka Consumer lacks built-in functionality for fault-tolerant stateful processing.
Consumer starts and consumes the topic from beginning. This worked
perfectly, but the consumer has to consume the 10 hours change log to
construct the last values table.
During the first time your application starts up, what you said is correct.
To avoid this during every restart, store the key-value data in a file.
For example, you might want to use a persistent map (like MapDB).
Since you give the consumer group.id and you commit the offset either periodically or after each record is stored in the map, the next time your application restarts it will read it from the last comitted offset for that group.id.
So the problem of taking a lot of time occurs only initially (during first time). So long as you have the file, you don't need to consume from beginning.
In case, if the file is not there or is deleted, just seekToBeginning in the KafkaConsumer and build it again.
Somewhere, you need to store this key-values for retrieval and why cannot it be a persistent store?
In case if you want to use Kafka streams for whatever reason, then an alternative (not as simple as the above) is to use a persistent backed store.
For example, a persistent global store.
streamsBuilder.addGlobalStore(Stores.keyValueStoreBuilder(Stores.persistentKeyValueStore(topic), keySerde, valueSerde), topic, Consumed.with(keySerde, valueSerde), this::updateValue);
P.S: There will be a file called .checkpoint in the directory which stores the offsets. In case if the topic is deleted in the middle you get OffsetOutOfRangeException. You may want to avoid this, perhaps by using UncaughtExceptionHandler
Refer to https://stackoverflow.com/a/57301986/2534090 for more.
Finally,
It is better to use Consumer with persistent file rather than Streams for this, because of simplicity it offers.