I try to aggregate a large amount of data using time windows of different sizes using Kafka Streams.
I increased the cache size to 2 GB, but when I set the window size in 1 hour I get the CPU load of 100% and the application starts to slow down.
My code looks like this:
val tradeStream = builder.stream<String, Trade>(configuration.topicNamePattern, Consumed.with(Serdes.String(), JsonSerde(Trade::class.java)))
tradeStream
.groupBy(
{ _, trade -> trade.pair },
Serialized.with(JsonSerde(TokensPair::class.java), JsonSerde(Trade::class.java))
)
.windowedBy(TimeWindows.of(windowDuration).advanceBy(windowHop).until(windowDuration))
.aggregate(
{ Ticker(windowDuration) },
{ _, newValue, aggregate -> aggregate.add(newValue) },
Materialized.`as`<TokensPair, Ticker>(storeByPairs)
.withKeySerde(JsonSerde(TokensPair::class.java))
.withValueSerde(JsonSerde(Ticker::class.java))
)
.toStream()
.filter { tokensPair, _ -> filterFinishedWindow(tokensPair.window(), windowHop) }
.map { tokensPair, ticker -> KeyValue(
TickerKey(ticker.tokensPair!!, windowDuration, Timestamp(tokensPair.window().start())),
ticker.calcPrice()
)}
.to(topicName, Produced.with(JsonSerde(TickerKey::class.java), JsonSerde(Ticker::class.java)))
In addition, before sending the aggregated data to the kafka topic they are filtered by end time of the window in order send to topic just finished window.
Perhaps there are some better approaches for implementing this kind of aggregation?
With out a knowing a bit more of the system it’s hard to diagnose.
How many partitions are present in your cluster ?
How many stream applications are you running ?
Are the stream applications running on the same machine ?
Are you using compression for the payload ?
Does it work for smaller intervals?
Hope that helps.
Related
Problem Statement : Consume Million Records from Kafka & Spin the Parallel API calls ( 120 TPS )
i'm using project reactor kafka for Kafka Message Consumption ( 2 million records per hour ). Once i receive the kafka messages then I need to spin the parallel API calls ( 10 TPS ) to "abc.com/actuator". I tested the kafka part .. I'm able to consume the million records in 20 mins ( with 4 Kubernetes Pods ). But when i spin API calls everything going in sequential but not parallel. Also, API taking 1000ms to return response ( which adds the waiting time ). Can someone help to understand what's wrong in parallel API calls ? Thanks in advance.
ReceiverOptions<Integer, String> options =
receiverOptions
.subscription(Collections.singleton(topic))
.addAssignListener(partitions -> log.debug("onPartitionsAssigned {}", partitions))
.addRevokeListener(partitions -> log.debug("onPartitionsRevoked {}", partitions));
final Flux<ReceiverRecord<Integer, String>> messages = Flux.defer() -> {
final Flux<ReceiverRecord<Integer,String>> receiver =
kafkaReceicer.create(options).receive();
return Flux.<ReceiverRecord<Integer,String>>create(emmitter -> {
kafkaFlux.doOnNext(record-> {
ReceiverOffset offset = record.receiverOffset();
offset.acknowledge();
emitter.next(record);
}).blockLast();
});
});
WebClient wc = WebClient.create("abc.com:8443");
Flux.from(messages).flatMap(event -> wc.get().uri("/actuator").retrieve().bodyToMono(String.class)
.parallel(10).runOn(Schedulers.parallel()).subscribe();
Kubernetes configuration :
CPU : 300m
Memory : 10Gi
I need to poll kafka and process events in bulk. In Reactor kafka, since its a steaming API, I am getting events as stream. Is there a way to combine and get a fixed max size of events.
This is what I doing currently.
final Flux<Flux<ConsumerRecord<String, String>>> receive = KafkaReceiver.create(eventReceiverOptions)
.receiveAutoAck();
receive
.concatMap(r -> r)
.doOnEach(listSignal -> log.info("got one message"))
.map(consumerRecords -> consumerRecords.value())
.collectList()
.flatMap(strings -> {
log.info("Read messages of size {}", strings.size());
return processBulkMessage(strings)
.doOnSuccess(aBoolean -> log.info("Processed records"))
.thenReturn(strings);
}).subscribe();
But code just hangs after collectList and never goes to the last flatMap.
Thanks In advance.
You just do a "flattening" with your plain .concatMap(r -> r) therefore you fully eliminate what is there is a batching originally built by that receiveAutoAck(). To have a stream of lists for your processBulkMessage() to process consider to move all the batch logic into that concatMap():
.concatMap(batch -> batch
.doOnEach(listSignal -> log.info("got one message"))
.map(ConsumerRecord::value)
.collectList())
.flatMap(strings -> {
I currently have a Kafka Stream service:
{
val _ = metrics
val timeWindow = Duration.of(config.timeWindow.toMillis, ChronoUnit.MILLIS)
val gracePeriod = Duration.of(config.gracePeriod.toMillis, ChronoUnit.MILLIS)
val store = Materialized
.as[AggregateKey, AggMetricDocument, ByteArrayWindowStore](
config.storeNames.reducerStateStore
)
.withRetention(gracePeriod.plus(timeWindow))
.withCachingEnabled()
builder
.stream[AggregateKey, AggMetricDocument(config.topicNames.aggMetricDocumentsIntermediate)
.groupByKey
.windowedBy(TimeWindows.of(timeWindow).grace(gracePeriod))
.reduce { (metricDoc1, metricDoc2) =>
metricDoc1.copy(
metrics = metricDoc1.metrics
.merge(metricDoc2.metrics, config.metricDocumentsReducerSamplesReservoirSize),
docsCount = metricDoc1.docsCount + metricDoc2.docsCount
)
}(store)
.toStream
.to(config.topicNames.aggMetricDocuments)(
Produced.`with`(AggregateKey.windowedSerde, AggMetricDocument.flattenSerde)
)
}
While
timeWindow=1m
gracePeriod=39h
The stream works fine on normal cardinality but when it starts processing high cardinality data(more than 100 million different keys) the processing rate declines after some time.
By looking at the RocksDB metrics it looks like the avg fetch latency is rising from 30µs to 600µs, and some decreasing in the hit rate of the filters and index as seen in the following test(sending ~15K/sec messages with uniqe keys):
The disk throughput and io seems under the disk limits.
The cpu usage and Load Avg increasing(the limit is 5 cores):
I made some RocksDB config modification:
private val cache = new LRUCache(2147483648, -1, false, 0.9) // 2GB
private val writeBufferManager = new WriteBufferManager(2147483648, cache)
val tableConfig = options.tableFormatConfig.asInstanceOf[BlockBasedTableConfig]
tableConfig.setBlockCache(BoundedMemoryRocksDBConfig.cache)
tableConfig.setCacheIndexAndFilterBlocks(true) // Default false
tableConfig.setCacheIndexAndFilterBlocksWithHighPriority(true)
All other setting has the default values of Kafka Streams.
It seems that increasing the LRUCache helps for a while.
I am not sure what the core problem, Does someone have an idea of what causing this problem and which configuration should I tune to get better performance on high cardinality data.
I'm writing a spark streaming job that reads data from Kafka, makes some changes to the records and sends the results to another Kafka cluster.
The performance of the job seems very slow, the processing rate is about 70,000 records per second. The sampling shows that 30% of the time is spent on reading data and processing it and the remaining 70% spent on sending data to the Kafka.
I've tried to tweak the Kafka configurations, add memory, change batch intervals, but the only change that works is to add more cores.
profiler:
Spark job details:
max.cores 30
driver memory 6G
executor memory 16G
batch.interval 3 minutes
ingres rate 180,000 messages per second
Producer Properties (I've tried different varations)
def buildProducerKafkaProperties: Properties = {
val producerConfig = new Properties
producerConfig.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, destKafkaBrokers)
producerConfig.put(ProducerConfig.ACKS_CONFIG, "all")
producerConfig.put(ProducerConfig.BATCH_SIZE_CONFIG, "200000")
producerConfig.put(ProducerConfig.LINGER_MS_CONFIG, "2000")
producerConfig.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "gzip")
producerConfig.put(ProducerConfig.RETRIES_CONFIG, "0")
producerConfig.put(ProducerConfig.BUFFER_MEMORY_CONFIG, "13421728")
producerConfig.put(ProducerConfig.SEND_BUFFER_CONFIG, "13421728")
producerConfig
}
Sending code
stream
.foreachRDD(rdd => {
val offsetRanges = rdd.asInstanceOf[HasOffsetRanges].offsetRanges
rdd
.map(consumerRecord => doSomething(consumerRecord))
.foreachPartition(partitionIter => {
val producer = kafkaSinkBroadcast.value
partitionIter.foreach(row => {
producer.send(kafkaTopic, row)
producedRecordsAcc.add(1)
})
stream.asInstanceOf[CanCommitOffsets].commitAsync(offsetRanges)
})
Versions
Spark Standalone cluster 2.3.1
Destination Kafka cluster 1.1.1
Kafka topic has 120 partitions
Can anyone suggest how to increase sending throughput?
Update Jul 2019
size: 150k messages per second, each message has about 100 columns.
main settings:
spark.cores.max = 30 # the cores balanced between all the workers.
spark.streaming.backpressure.enabled = true
ob.ingest.batch.duration= 3 minutes
I've tried to use rdd.repartition(30), but it made the execution slower by ~10%
Thanks
Try to use repartition as below -
val numPartitons = ( Number of executors * Number of executor cores )
stream
.repartition(numPartitons)
.foreachRDD(rdd => {
val offsetRanges = rdd.asInstanceOf[HasOffsetRanges].offsetRanges
rdd
.map(consumerRecord => doSomething(consumerRecord))
.foreachPartition(partitionIter => {
val producer = kafkaSinkBroadcast.value
partitionIter.foreach(row => {
producer.send(kafkaTopic, row)
producedRecordsAcc.add(1)
})
stream.asInstanceOf[CanCommitOffsets].commitAsync(offsetRanges)
})
This will give you optimum performance.
Hope this will help.
What I'd like to do is this:
Consume records from a numbers topic (Long's)
Aggregate (count) the values for each 5 sec window
Send the FINAL aggregation result to another topic
My code looks like this:
KStream<String, Long> longs = builder.stream(
Serdes.String(), Serdes.Long(), "longs");
// In one ktable, count by key, on a five second tumbling window.
KTable<Windowed<String>, Long> longCounts =
longs.countByKey(TimeWindows.of("longCounts", 5000L));
// Finally, sink to the long-avgs topic.
longCounts.toStream((wk, v) -> wk.key())
.to("long-counts");
It looks like everything works as expected, but the aggregations are sent to the destination topic for each incoming record. My question is how can I send only the final aggregation result of each window?
In Kafka Streams there is no such thing as a "final aggregation". Windows are kept open all the time to handle out-of-order records that arrive after the window end-time passed. However, windows are not kept forever. They get discarded once their retention time expires. There is no special action as to when a window gets discarded.
See Confluent documentation for more details: http://docs.confluent.io/current/streams/
Thus, for each update to an aggregation, a result record is produced (because Kafka Streams also update the aggregation result on out-of-order records). Your "final result" would be the latest result record (before a window gets discarded). Depending on your use case, manual de-duplication would be a way to resolve the issue (using lower lever API, transform() or process())
This blog post might help, too: https://timothyrenner.github.io/engineering/2016/08/11/kafka-streams-not-looking-at-facebook.html
Another blog post addressing this issue without using punctuations: http://blog.inovatrend.com/2018/03/making-of-message-gateway-with-kafka.html
Update
With KIP-328, a KTable#suppress() operator is added, that will allow to suppress consecutive updates in a strict manner and to emit a single result record per window; the tradeoff is an increase latency.
From Kafka Streams version 2.1, you can achieve this using suppress.
There is an example from the mentioned apache Kafka Streams documentation that sends an alert when a user has less than three events in an hour:
KGroupedStream<UserId, Event> grouped = ...;
grouped
.windowedBy(TimeWindows.of(Duration.ofHours(1)).grace(ofMinutes(10)))
.count()
.suppress(Suppressed.untilWindowCloses(unbounded()))
.filter((windowedUserId, count) -> count < 3)
.toStream()
.foreach((windowedUserId, count) -> sendAlert(windowedUserId.window(), windowedUserId.key(), count));
As mentioned in the update of this answer, you should be aware of the tradeoff. Moreover, note that suppress() is based on event-time.
I faced the issue, but I solve this problem to add grace(0) after the fixed window and using Suppressed API
public void process(KStream<SensorKeyDTO, SensorDataDTO> stream) {
buildAggregateMetricsBySensor(stream)
.to(outputTopic, Produced.with(String(), new SensorAggregateMetricsSerde()));
}
private KStream<String, SensorAggregateMetricsDTO> buildAggregateMetricsBySensor(KStream<SensorKeyDTO, SensorDataDTO> stream) {
return stream
.map((key, val) -> new KeyValue<>(val.getId(), val))
.groupByKey(Grouped.with(String(), new SensorDataSerde()))
.windowedBy(TimeWindows.of(Duration.ofMinutes(WINDOW_SIZE_IN_MINUTES)).grace(Duration.ofMillis(0)))
.aggregate(SensorAggregateMetricsDTO::new,
(String k, SensorDataDTO v, SensorAggregateMetricsDTO va) -> aggregateData(v, va),
buildWindowPersistentStore())
.suppress(Suppressed.untilWindowCloses(unbounded()))
.toStream()
.map((key, value) -> KeyValue.pair(key.key(), value));
}
private Materialized<String, SensorAggregateMetricsDTO, WindowStore<Bytes, byte[]>> buildWindowPersistentStore() {
return Materialized
.<String, SensorAggregateMetricsDTO, WindowStore<Bytes, byte[]>>as(WINDOW_STORE_NAME)
.withKeySerde(String())
.withValueSerde(new SensorAggregateMetricsSerde());
}
Here you can see the result