Kafka Streams K-Table size monitoring - apache-kafka

I have a stream topology which consumes from a topic and runs an aggregation and builds a KTable which is materialized into rocksDB.
I have another application that consumes all events from that same topic daily, and sends tombstone messages for events that meet some specific criteria (i.e. they are no longer needed).
The aggregation deals with this and deletes from the state stores, but I'm looking at monitoring either the size of the state store or the change log topic - anything really that tells me the size of the ktable.
I have exposed the JMX metrics, but there is nothing there that appears to give me what I need. I can see the total number of "puts" into rocksDB, but not the total number of keys.
My apps are spring boot and I would like to expose the metrics via prometheus.
Has anyone solved this issue or any ideas that would help?

You can get the approximate count in each partition by access to the underlying state store of the KTable using this KeyValueStore#approximateNumEntries() and then export this count to prometheus (each partition has one count).
To access to the underling state store you can using the low level processor API to get access to a KeyValueStore through each ProcessorContext in each StreamTask (correspond to a partition). Just add a KStream#transformValues() to your Topology:
kStream
...
.transformValues(ExtractCountTransformer::new, "your_ktable_name")
...
And in ExtractCountTransformer extract the count to prometheus:
#Log4j2
public class ExtractCountTransformer implements ValueTransformerWithKey<String, String, String> {
private KeyValueStore<String, String> yourKTableKvStore;
private ProcessorContext context;
#Override
public void init(ProcessorContext context) {
this.context = context;
yourKTableKvStore = (KeyValueStore<String, String>) context.getStateStore("your_ktable_name");
}
#Override
public String transform(String readOnlyKey, String value) {
//extract count to prometheus
log.debug("partition {} - approx count {}", context.partition(), yourKTableKvStore.approximateNumEntries());
yourKTableKvStore.approximateNumEntries();
return value;
}
#Override
public void close() {
}
}

If you have JMX metrics exposed, you can get many kafka metrics, the one that you are looking for is kafka_stream_state_estimate_num_keys.

Related

Kafka Streams RocksDB Config - Static cache understanding

I'm using Kafka Streams 3.0.0, Spring Cloud Stream and Micrometer to surface the metrics to /actuator/prometheus. The metric I'm referring to is kafka_stream_state_block_cache_capacity, which I believe is equivalent to the block-cache-capacity metric from this Confluent document.
I came across this Medium article which mentions that a static cache within the RocksDB config setter class will be executed per StreamThread. I also came across a Confluent document that says by using static, the memory usage across all instances can be bounded.
In my setup, one application is handling more than one Kafka topic partition. However, from the metric, I see that the different partitions are assigned to the same StreamThread and the memory is multiplied by the number of partitions that the application is handling.
For example, if my application handles two Kafka partitions, and my RocksDB config is as shown below:
ublic class BoundedMemoryRocksDBConfig implements RocksDBConfigSetter {
static {
RocksDB.loadLibrary();
}
private static long lruCacheBytes = 100L * 1024L * 1024L; //100MB
private static long memtableBytes = 1024 * 1024;
private static int nMemtables = 1;
private static long writeBufferManagerBytes = 95 * 1024 * 1024;
private static org.rocksdb.Cache cache = new org.rocksdb.LRUCache(lruCacheBytes);
private static org.rocksdb.WriteBufferManager writeBufferManager = new org.rocksdb.WriteBufferManager(writeBufferManagerBytes, cache);
#Override
public void setConfig(final String storeName, final Options options, final Map<String, Object> configs) {
BlockBasedTableConfig tableConfig = (BlockBasedTableConfig)
options.tableFormatConfig();
tableConfig.setBlockCache(cache);
tableConfig.setCacheIndexAndFilterBlocks(true);
options.setWriteBufferManager(writeBufferManager);
// These options are recommended to be set when bounding the total memory
tableConfig.setCacheIndexAndFilterBlocksWithHighPriority(true);
tableConfig.setPinTopLevelIndexAndFilter(true);
options.setMaxWriteBufferNumber(nMemtables);
options.setWriteBufferSize(memtableBytes);
options.setTableFormatConfig(tableConfig);
}
#Override
public void close(final String storeName, final Options options) {}
}
I can see an entry like this in /actuator/prometheus:
kafka_stream_state_block_cache_capacity{kafka_version="3.0.0",rocksdb_window_state_id="one.minute.window.count",spring_id="stream-builder-process",task_id="0_0",thread_id="7ed0af6a-244f-4b87-b4cf-f2f311df976c-StreamThread-1",} 2.097152E8
kafka_stream_state_block_cache_capacity{kafka_version="3.0.0",rocksdb_window_state_id="one.minute.window.count",spring_id="stream-builder-process",task_id="0_1",thread_id="7ed0af6a-244f-4b87-b4cf-f2f311df976c-StreamThread-1",} 2.097152E8
The entry above shows two tasks (because it handles two Kafka topic partitions), and each task using ~200MB.
My understanding is that each task uses ~200MB because the static cache is set to 100MB and based on this StackOverflow answer, there will be 2 segments created where each segment relates to a state store. Therefore ~100MB * 2 = ~200MB.
Also, since there are two entries for the kafka_stream_state_block_cache_capacity metric, one per task, it means that my application uses a total of ~200MB * 2 = ~400MB.
Is my understanding of memory allocation correct and is the static cache allocated per partition, instead of per Stream Thread?

Kafka streams event deduplication keeping last event in window

I'm using Kafka Streams in a deduplication events problem over short time windows (<= 1 minute).
First I've tried to tackle the problem by using DSL API with .suppress(Suppressed.untilWindowCloses(...)) operator but, given the fact that wall-clock time is not yet supported (I've seen the KIP 424), this operator is not viable for my use case.
Then, I've followed this official Confluent example in which low level Processor API is used and it was working fine but has one major limitation for my use-case. The single event (obtained by deduplication) is emitted at the beginning of the time window, subsequent duplicated events are "suppressed". In my use case I need the reverse of that, meaning that a single event should be emitted at the end of the window.
I'm asking for suggestions on how to implement this use case with Processor API.
My idea was to use the Processor API with a custom Transformer and a Punctuator.
The transformer would store in a WindowStore the distinct keys received without returning any KeyValue. Simultaneously, I'd schedule a punctuator running with an interval equal to the size of the window in the WindowStore. This punctuator will iterate over the elements in the store and forward them downstream.
The following are some core parts of the logic:
DeduplicationTransformer (slightly modified from official Confluent example):
#Override
#SuppressWarnings("unchecked")
public void init(final ProcessorContext context) {
this.context = context;
eventIdStore = (WindowStore<E, V>) context.getStateStore(this.storeName);
// Schedule punctuator for this transformer.
context.schedule(Duration.ofMillis(this.windowSizeMs), PunctuationType.WALL_CLOCK_TIME,
new DeduplicationPunctuator<E, V>(eventIdStore, context, this.windowSizeMs));
}
#Override
public KeyValue<K, V> transform(final K key, final V value) {
final E eventId = idExtractor.apply(key, value);
if (eventId == null) {
return KeyValue.pair(key, value);
} else {
if (!isDuplicate(eventId)) {
rememberNewEvent(eventId, value, context.timestamp());
}
return null;
}
}
DeduplicationPunctuator:
public DeduplicationPunctuator(WindowStore<E, V> eventIdStore, ProcessorContext context,
long retainPeriodMs) {
this.eventIdStore = eventIdStore;
this.context = context;
this.retainPeriodMs = retainPeriodMs;
}
#Override
public void punctuate(long invocationTime) {
LOGGER.info("Punctuator invoked at {}, searching from {}", new Date(invocationTime), new Date(invocationTime-retainPeriodMs));
KeyValueIterator<Windowed<E>, V> it =
eventIdStore.fetchAll(invocationTime - retainPeriodMs, invocationTime + retainPeriodMs);
while (it.hasNext()) {
KeyValue<Windowed<E>, V> next = it.next();
LOGGER.info("Punctuator running on {}", next.key.key());
context.forward(next.key.key(), next.value);
// Delete from store with tombstone
eventIdStore.put(next.key.key(), null, invocationTime);
context.commit();
}
it.close();
}
Is this a valid approach?
With the previous code, I'm running some integration tests and I've some synchronization issues. How can I be sure that the start of the window will coincide with the Punctuator's scheduled interval?
Also as an alternative approach, I was wondering (I've googled with no result), if there is any event triggered by window closing to which I can attach a callback in order to iterate over store and publish only distinct events.
Thanks.

Kafka Processor API with exactly once semantic for each record processed

Scenario:
We are using kafka processor API ( not DSL ) for reading records from source topic, stream
processor will write records to one or more target topics.
We know exactly once can be implemented for the entire processor level by using :
props.put("isolation.level", "read_committed");
But we want to decide based on the incoming records key if we want exactly once or at-least once semantic .
import org.apache.kafka.streams.processor.Processor;
public class StreamRouterProcessor implements Processor<String,Object>
{
private ProcessorContext context;
#Override
public void init(ProcessorContext context) {
}
#Override
public void process(String eventName, String eventMessage) // this is called for each record
{
}
}
Is there a way to select exactly-once or at-least once on the fly for
each record
being processed ( perhaps for each record processed by the process() method above) ? .
For enabling exactly_once semantic you should use StreamsConfig.PROCESSING_GUARANTEE_CONFIG property. ConsumerConfig.ISOLATION_LEVEL_CONFIG (isolation.level) is consumer config and should be use if you use raw Consumer
It is not possible to choose processing guarantees (exactly-once or at-least-once) at message level

Stateful filtering/flatMapValues in Kafka Streams?

I'm trying to write a simple Kafka Streams application (targeting Kafka 2.2/Confluent 5.2) to transform an input topic with at-least-once semantics into an exactly-once output stream. I'd like to encode the following logic:
For each message with a given key:
Read a message timestamp from a string field in the message value
Retrieve the greatest timestamp we've previously seen for this key from a local state store
If the message timestamp is less than or equal to the timestamp in the state store, don't emit anything
If the timestamp is greater than the timestamp in the state store, or the key doesn't exist in the state store, emit the message and update the state store with the message's key/timestamp
(This is guaranteed to provide correct results based on ordering guarantees that we get from the upstream system; I'm not trying to do anything magical here.)
At first I thought I could do this with the Kafka Streams flatMapValues operator, which lets you map each input message to zero or more output messages with the same key. However, that documentation explicitly warns:
This is a stateless record-by-record operation (cf. transformValues(ValueTransformerSupplier, String...) for stateful value transformation).
That sounds promising, but the transformValues documentation doesn't make it clear how to emit zero or one output messages per input message. Unless that's what the // or null aside in the example is trying to say?
flatTransform also looked somewhat promising, but I don't need to manipulate the key, and if possible I'd like to avoid repartitioning.
Anyone know how to properly perform this kind of filtering?
you could use Transformer for implementing stateful operations as you described above. In order to not propagate a message downstream, you need to return null from transform method, this mentioned in Transformer java doc. And you could manage propagation via processorContext.forward(key, value). Simplified example provided below
kStream.transform(() -> new DemoTransformer(stateStoreName), stateStoreName)
public class DemoTransformer implements Transformer<String, String, KeyValue<String, String>> {
private ProcessorContext processorContext;
private String stateStoreName;
private KeyValueStore<String, String> keyValueStore;
public DemoTransformer(String stateStoreName) {
this.stateStoreName = stateStoreName;
}
#Override
public void init(ProcessorContext processorContext) {
this.processorContext = processorContext;
this.keyValueStore = (KeyValueStore) processorContext.getStateStore(stateStoreName);
}
#Override
public KeyValue<String, String> transform(String key, String value) {
String existingValue = keyValueStore.get(key);
if (/* your condition */) {
processorContext.forward(key, value);
keyValueStore.put(key, value);
}
return null;
}
#Override
public void close() {
}
}

How can I create a state store that is restorable from an existing changelog topic?

I am using the streams DSL to deduplicate a topic called users:
topology.addStateStore(Stores.keyValueStoreBuilder(Stores.persistentKeyValueStore("users"), byteStringSerde, userSerde));
KStream<ByteString, User> users = topology.stream("users", Consumed.with(byteStringSerde, userSerde));
users.transform(() -> new Transformer<ByteString, User, KeyValue<ByteString, User>>() {
private KeyValueStore<ByteString, User> store;
#Override
#SuppressWarnings("unchecked")
public void init(ProcessorContext context) {
store = (KeyValueStore<ByteString, User>) context.getStateStore("users");
}
#Override
public KeyValue<ByteString, User> transform(ByteString key, User value) {
User user = store.get(key);
if (user != null) {
store.put(key, value);
return new KeyValue<>(key, value);
}
return null;
}
#Override
public KeyValue<ByteString, User> punctuate(long timestamp) {
return null;
}
#Override
public void close() {
}
}, "users");
Given this code, Kafka Streams creates an internal changelog topic for the users store. I am wondering, is there some way I can use the existing users topic instead of creating an essentially identical changelog topic?
PS. I see that StreamsBuilder says this is possible:
However, no internal changelog topic is created since the original input topic can be used for recovery
But following the code to InternalStreamsBuilder#table() and InternalStreamsBuilder#createKTable(), I am not seeing how it's achieving this effect.
Not all thing the DSL does are possible at the Processor API level -- it's using some internals, that are not part of public API to achieve what you describe.
It's the call to InternalTopologyBuilder#connectSourceStoreAndTopic() that does the trick (cf. InternalStreamsBuilder#table()).
For your use case about de-duplication, it seem that you need two topics though (depending what de-duplication logic you apply). Restoring via a changelog topic does key-based updates and thus does not consider values (that might be part of your deduplication logic, too).