I'm using Kafka Streams 3.0.0, Spring Cloud Stream and Micrometer to surface the metrics to /actuator/prometheus. The metric I'm referring to is kafka_stream_state_block_cache_capacity, which I believe is equivalent to the block-cache-capacity metric from this Confluent document.
I came across this Medium article which mentions that a static cache within the RocksDB config setter class will be executed per StreamThread. I also came across a Confluent document that says by using static, the memory usage across all instances can be bounded.
In my setup, one application is handling more than one Kafka topic partition. However, from the metric, I see that the different partitions are assigned to the same StreamThread and the memory is multiplied by the number of partitions that the application is handling.
For example, if my application handles two Kafka partitions, and my RocksDB config is as shown below:
ublic class BoundedMemoryRocksDBConfig implements RocksDBConfigSetter {
static {
RocksDB.loadLibrary();
}
private static long lruCacheBytes = 100L * 1024L * 1024L; //100MB
private static long memtableBytes = 1024 * 1024;
private static int nMemtables = 1;
private static long writeBufferManagerBytes = 95 * 1024 * 1024;
private static org.rocksdb.Cache cache = new org.rocksdb.LRUCache(lruCacheBytes);
private static org.rocksdb.WriteBufferManager writeBufferManager = new org.rocksdb.WriteBufferManager(writeBufferManagerBytes, cache);
#Override
public void setConfig(final String storeName, final Options options, final Map<String, Object> configs) {
BlockBasedTableConfig tableConfig = (BlockBasedTableConfig)
options.tableFormatConfig();
tableConfig.setBlockCache(cache);
tableConfig.setCacheIndexAndFilterBlocks(true);
options.setWriteBufferManager(writeBufferManager);
// These options are recommended to be set when bounding the total memory
tableConfig.setCacheIndexAndFilterBlocksWithHighPriority(true);
tableConfig.setPinTopLevelIndexAndFilter(true);
options.setMaxWriteBufferNumber(nMemtables);
options.setWriteBufferSize(memtableBytes);
options.setTableFormatConfig(tableConfig);
}
#Override
public void close(final String storeName, final Options options) {}
}
I can see an entry like this in /actuator/prometheus:
kafka_stream_state_block_cache_capacity{kafka_version="3.0.0",rocksdb_window_state_id="one.minute.window.count",spring_id="stream-builder-process",task_id="0_0",thread_id="7ed0af6a-244f-4b87-b4cf-f2f311df976c-StreamThread-1",} 2.097152E8
kafka_stream_state_block_cache_capacity{kafka_version="3.0.0",rocksdb_window_state_id="one.minute.window.count",spring_id="stream-builder-process",task_id="0_1",thread_id="7ed0af6a-244f-4b87-b4cf-f2f311df976c-StreamThread-1",} 2.097152E8
The entry above shows two tasks (because it handles two Kafka topic partitions), and each task using ~200MB.
My understanding is that each task uses ~200MB because the static cache is set to 100MB and based on this StackOverflow answer, there will be 2 segments created where each segment relates to a state store. Therefore ~100MB * 2 = ~200MB.
Also, since there are two entries for the kafka_stream_state_block_cache_capacity metric, one per task, it means that my application uses a total of ~200MB * 2 = ~400MB.
Is my understanding of memory allocation correct and is the static cache allocated per partition, instead of per Stream Thread?
Related
I am trying to create a custom joining consumer to join multiple events.
I have created a topology which have four sub-toplogies(subtopology-0, subtoplogy-1, subtopology-2, subtopology-3) not in the exact order of what is described by topology.describe().
I have created a state-store in three of the sub-toplogies(subtopology-0, subtoplogy-1, subtopology-2) and trying to attach all the state-store created different state-stores using .connectProcessorAndStateStores("PROCESS2", "COUNTS") as per the kafka developer guide https://kafka.apache.org/0110/documentation/streams/developer-guide
Here is the code snippet of how I am creating and attaching processors to the topology.
class StreamCustomizer implements KafkaStreamsInfrastructureCustomizer {
public someMethod(StreamBuilder builder) {
Topology topology = builder.build();
topology.addProcessor("Processor1", new Processor() {...}, "state-store-1).addStateStore(store1,..);
topology.addProcessor("Processor2", new Processor() {...}, "state-store-1)
.addStateStore(store1,..);
topology.addProcessor("Processor3", new Processor() {...}, "state-store-1)
addStateStore(store1,..);
topology.addProcessor("Processor4", new Processor4() {...}, "Processor1", Processor2", "Processor3")
connectProcessorAndStateStores("Prcoessor4", "state-store-1", "state-store-2", "state-store-3");
}
}
This is how the processor is defined for all the sub-toplogies as described above
new Processor {
private ProcessorContext;
private KeyValueStore<K, V> store;
init(ProcessorContext) {
this.context = context;
store = context.getStore("store-name");
}
}
This is hot the processor 4 is written, with all the state-store retrieved in init method from context store.
new Processor4() {
private KeyValueStore<K, V> store1;
private KeyValueStore<K, V> store2;
private KeyValueStore<K, V> store3;
}
I am observing a strange behaviour that with the above code, store1, store2, and store3 all are re-intiailized and no keys are preserved whatever were stored in their respective sub-toplogies(1,2,3). However, the same code works i.e., all state store preserved the key-value stored in their respective sub-topology when state-stores are declared at class level.
class StreamCustomizer implements KafkaStreamsInfrastructureCustomizer {
private KeyValueStore <K, V> store1;
private KeyValueStore <K, V> store2;
private KeyValueStore <K, V> store3;
}
and then in the processor implementation, just init the state-store in init method.
new Processor {
private ProcessorContext;
init(ProcessorContext) {
this.context = context;
store1 = context.getStore("store-name-1");
}
}
Can someone please assist in finding the reason, or if there is anything wrong in this topology? Also, I have read in state-store can be shared within the same sub-topology.
Hard to say (the code snippets are not really clear), however, if share state you effectively merge sub-topologies. Thus, if you do it correct, you would end up with a single sub-topology containing all your processor.
As long as you see 4 sub-topologies, state store are not shared yet, ie, not connected correctly.
I have a microservice that perform some stateful processing. The application construct a KStream from an input topic, do some stateful processing then write data into the output topic.
I will be running 3 of this applications in the same group. There are 3 parameters that I need to store in the event when the microservice goes down, the microservice that takes over can query the shared statestore and continue where the crashed service left off.
I am thinking of pushing these 3 parameters into a statestore and query the data when the other microservice takes over. From my research, I have seen a lot of example when people perform event counting using state store but that's not exactly what I want, does anyone know an example or what is the right approach for this problem?
So you want to do 2 things:
a. the service going down have to store the parameters:
If you want to do it in a straightforward way than all you have to do is to write a message in the topic associated with the state store (the one you are reading with a KTable). Use the Kafka Producer API or a KStream (could be kTable.toStream()) to do it and that's it.
Otherwise you could create manually a state store:
// take these serde as just an example
Serde<String> keySerde = Serdes.String();
Serde<String> valueSerde = Serdes.String();
KeyValueBytesStoreSupplier storeSupplier = inMemoryKeyValueStore(stateStoreName);
streamsBuilder.addStateStore(Stores.keyValueStoreBuilder(storeSupplier, keySerde, valueSerde));
then use it in a transformer or processor to add items to it; you'll have to declare this in the transformer/processor:
// depending on the serde above you might have something else then String
private KeyValueStore<String, String> stateStore;
and initialize the stateStore variable:
#Override
public void init(ProcessorContext context) {
stateStore = (KeyValueStore<String, String>) context.getStateStore(stateStoreName);
}
and later use the stateStore variable:
#Override
public KeyValue<String, String> transform(String key, String value) {
// using stateStore among other actions you might take here
stateStore.put(key, processedValue);
}
b. read the parameters in the service taking over:
You could do it with a Kafka consumer but with Kafka Streams you first have to make the store available; the easiest way to do it is by creating a KTable; then you have to get the queryable store name that is automatically created with the KTable; then you have to actually get access to the store; then you extract a record value from the store (i.e. a parameter value by its key).
// this example is a modified copy of KTable javadocs example
final StreamsBuilder streamsBuilder = new StreamsBuilder();
// Creating a KTable over the topic containing your parameters a store shall automatically be created.
//
// The serde for your MyParametersClassType could be
// new org.springframework.kafka.support.serializer.JsonSerde(MyParametersClassType.class)
// though further configurations might be necessary here - e.g. setting the trusted packages for the ObjectMapper behind JsonSerde.
//
// If the parameter-value class is a String then you could use Serdes.String() instead of a MyParametersClassType serde.
final KTable paramsTable = streamsBuilder.table("parametersTopicName", Consumed.with(Serdes.String(), <<your InstanceOfMyParametersClassType serde>>));
...
// see the example from KafkaStreams javadocs for more KafkaStreams related details
final KafkaStreams streams = ...;
streams.start()
...
// get the queryable store name that is automatically created with the KTable
final String queryableStoreName = paramsTable.queryableStoreName();
// get access to the store
ReadOnlyKeyValueStore view = streams.store(queryableStoreName, QueryableStoreTypes.timestampedKeyValueStore());
// extract a record value from the store
InstanceOfMyParametersClassType parameter = view.get(key);
I have a stream topology which consumes from a topic and runs an aggregation and builds a KTable which is materialized into rocksDB.
I have another application that consumes all events from that same topic daily, and sends tombstone messages for events that meet some specific criteria (i.e. they are no longer needed).
The aggregation deals with this and deletes from the state stores, but I'm looking at monitoring either the size of the state store or the change log topic - anything really that tells me the size of the ktable.
I have exposed the JMX metrics, but there is nothing there that appears to give me what I need. I can see the total number of "puts" into rocksDB, but not the total number of keys.
My apps are spring boot and I would like to expose the metrics via prometheus.
Has anyone solved this issue or any ideas that would help?
You can get the approximate count in each partition by access to the underlying state store of the KTable using this KeyValueStore#approximateNumEntries() and then export this count to prometheus (each partition has one count).
To access to the underling state store you can using the low level processor API to get access to a KeyValueStore through each ProcessorContext in each StreamTask (correspond to a partition). Just add a KStream#transformValues() to your Topology:
kStream
...
.transformValues(ExtractCountTransformer::new, "your_ktable_name")
...
And in ExtractCountTransformer extract the count to prometheus:
#Log4j2
public class ExtractCountTransformer implements ValueTransformerWithKey<String, String, String> {
private KeyValueStore<String, String> yourKTableKvStore;
private ProcessorContext context;
#Override
public void init(ProcessorContext context) {
this.context = context;
yourKTableKvStore = (KeyValueStore<String, String>) context.getStateStore("your_ktable_name");
}
#Override
public String transform(String readOnlyKey, String value) {
//extract count to prometheus
log.debug("partition {} - approx count {}", context.partition(), yourKTableKvStore.approximateNumEntries());
yourKTableKvStore.approximateNumEntries();
return value;
}
#Override
public void close() {
}
}
If you have JMX metrics exposed, you can get many kafka metrics, the one that you are looking for is kafka_stream_state_estimate_num_keys.
I am trying to understand kafka in some details with respect to kafka streams (kafka stream client to kafka).
I understand that KafkConsumer (java client) would get data from kafka, however I am not able to understand at which frequency does client poll kakfa topic to fetch the data?
The frequency of the poll is defined by your code because you're responsible to call poll.
A very naive example of user code using KafkaConsumer is like the following
public class KafkaConsumerExample {
...
static void runConsumer() throws InterruptedException {
final Consumer<Long, String> consumer = createConsumer();
final int giveUp = 100; int noRecordsCount = 0;
while (true) {
final ConsumerRecords<Long, String> consumerRecords =
consumer.poll(1000);
if (consumerRecords.count()==0) {
noRecordsCount++;
if (noRecordsCount > giveUp) break;
else continue;
}
consumerRecords.forEach(record -> {
System.out.printf("Consumer Record:(%d, %s, %d, %d)\n",
record.key(), record.value(),
record.partition(), record.offset());
});
consumer.commitAsync();
}
consumer.close();
System.out.println("DONE");
}
}
In this case the frequency is defined by the duration of processing the messages in consumerRecords.forEach.
However, keep in mind that if you don't call poll "fast enough" your consumer will be considered dead by the broker coordinator and a rebalance will be triggered.
This "fast enough" is determined by the property max.poll.interval.ms in kafka >= 0.10.1.0. See this answer for more details.
max.poll.interval.ms default value is five minutes, so if your consumerRecords.forEach takes longer than that your consumer will be considered dead.
If you don't want to use the raw KafkaConsumer directly you could use alpakka kafka, a library for consume from and produce to kafka topics in a safe and backpressured way (is based on akka streams).
With this library, the frequency of poll is determined by configuration akka.kafka.consumer.poll-interval.
We say is safe because it will continue polling to avoid the consumer is considered dead even when your processing can't keep up the rate. It's able to do this because KafkaConsumer allows pausing the consumer
/**
* Suspend fetching from the requested partitions. Future calls to {#link #poll(Duration)} will not return
* any records from these partitions until they have been resumed using {#link #resume(Collection)}.
* Note that this method does not affect partition subscription. In particular, it does not cause a group
* rebalance when automatic assignment is used.
* #param partitions The partitions which should be paused
* #throws IllegalStateException if any of the provided partitions are not currently assigned to this consumer
*/
#Override
public void pause(Collection<TopicPartition> partitions) { ... }
To fully understand this you should read about akka-streams and backpressure.
According to the Apache Beam 2.0.0 SDK Documentation GroupIntoBatches works only with KV collections.
My dataset contains only values and there's no need for introducing keys. However, to make use of GroupIntoBatches I had to implement “fake” keys with an empty string as a key:
static class FakeKVFn extends DoFn<String, KV<String, String>> {
#ProcessElement
public void processElement(ProcessContext c) {
c.output(KV.of("", c.element()));
}
}
So the overall pipeline looks like the following:
public static void main(String[] args) {
PipelineOptions options = PipelineOptionsFactory.create();
Pipeline p = Pipeline.create(options);
long batchSize = 100L;
p.apply("ReadLines", TextIO.read().from("./input.txt"))
.apply("FakeKV", ParDo.of(new FakeKVFn()))
.apply(GroupIntoBatches.<String, String>ofSize(batchSize))
.setCoder(KvCoder.of(StringUtf8Coder.of(), IterableCoder.of(StringUtf8Coder.of())))
.apply(ParDo.of(new DoFn<KV<String, Iterable<String>>, String>() {
#ProcessElement
public void processElement(ProcessContext c) {
c.output(callWebService(c.element().getValue()));
}
}))
.apply("WriteResults", TextIO.write().to("./output/"));
p.run().waitUntilFinish();
}
Is there any way to group into batches without introducing “fake” keys?
It is required to provide KV inputs to GroupIntoBatches because the transform is implemented using state and timers, which are per key-and-window.
For each key+window pair, state and timers necessarily execute serially (or observably so). You have to manually express the available parallelism by providing keys (and windows, though no runner that I know of parallelizes over windows today). The two most common approaches are:
Use some natural key like a user ID
Choose some fixed number of shards and key randomly. This can be harder to tune. You have to have enough shards to get enough parallelism, but each shard needs to include enough data that GroupIntoBatches is actually useful.
Adding one dummy key to all elements as in your snippet will cause the transform to not execute in parallel at all. This is similar to the discussion at Stateful indexing causes ParDo to be run single-threaded on Dataflow Runner.