There is a Topology:
.mapValues((key, messages) -> remoteService.sendMessages(messages))
.flatMapValues(results -> results)
.map((key, result) -> KeyValue.pair(getAggregationKey(result), getAggregationResult(systemClock, result)))
.groupByKey(Grouped.with(createJsonSerde(AggregationKey.class), createJsonSerde(AggregationResult.class)))
.reduce((aggregatedResult, v) -> {
int count = aggregatedResult.getCount();
return aggregatedResult.toBuilder().count(count + 1).build();
Duration duration = Duration.ofSeconds(60);
TimeWindows timeWindows = TimeWindows.of(duration).grace(Duration.ZERO);
My assumption is that aggregation results has to be sent to sink topic every 60 seconds or so but I noticed sometimes it sends duplicates(numbers are not precise): first event was sent at 50th second with counter 1000 and then at 58th second event with the same key was sent with counter 1050. It happens not every minute but quite frequently. Why could this happen?
What I also noticed is that the second event always has timestamp less than first one but larger offset. The same is for internal reduce topic.
We have a service where people can order a battery with their solar panels. As part of provisioning we try to fetch some details about the battery product, however it sometimes fails to get any data but we still want to send through the order to our CRM system.
To achieve this we are using the latest version of Kafka Streams leftJoin:
We receive an event on the order-received topic.
We filter out orders that do not contain a battery product.
We then wait up to 30mins for an event to come through on the order-battery-details topic.
If we dont receive that event, we want to send a new event to the battery-order topic with the data we do have.
This seems to be working fine when we receive both events, however it is inconsistent when we only receive the first event. Sometimes the order will come through immediately after the 30 min window, sometimes it takes several hours.
My question is, if the window has expired (ie. we failed to receive the right side of the join), what determines when the event will be sent? And what could be causing the long delay?
Here's a high level example of our service:
class BatteryOrderProducer {
fun buildPipeline(streamsBuilder: StreamsBuilder) {
// listen for new orders and filter out everything except orders with a battery
val orderReceivedReceivedStream =
Consumed.with(Serdes.String(), JsonSerde<OrderReceivedEvent>())
).filter { _, order ->
// check if the order contains a battery product
}.peek { key, order ->"Received order with a battery product: $key", order)
// listen for battery details events
val batteryDetailsStream = streamsBuilder
Consumed.with(Serdes.String(), JsonSerde<BatteryDetailsEvent>())
).peek { key, order ->"Received battery details: $key", order)
val valueJoiner: ValueJoiner<OrderReceivedEvent, BatteryDetailsEvent, BatteryOrder> =
ValueJoiner { orderReceived: OrderReceivedEvent, BatteryDetails: BatteryDetailsEvent? ->
// new BatteryOrder
if (BatteryDetails != null) {
// add battery details to the order if we get them
// return the BatteryOrder
// we always want to send through the battery order, even if we don't get the 2nd event.
).peek { key, value ->"Merged BatteryOrder", value)
The leftJoin will not trigger as long as there are no new recods. So if I have an order-received record with key A at time t, and then there is no new record (on either side of the join) for the next 5 hours, then there will be no output for the join for these 5 hours, because the leftJoin will not be triggered. In particular, leftJoin needs to receive a record that has a timestamp > t + 30m, for a null result to be sent.
I think to satisfy your requirements, you need to work with the more low-level Processor API:
In a Processor, you can define a Punctuator that runs regularly and checks if an order has been waiting for more than half an hour for details, and sends off the null record accordingly.
Is there a way to change the Flowable.interval period at runtime?"Start generating bullshit for 7 seconds:");
Flowable.interval(3, TimeUnit.SECONDS)
.map(tick -> random.nextInt(100))
.subscribe(tick ->"tick = " + tick));
TimeUnit.SECONDS.sleep(7);"Change interval to 2 seconds:");
I have a workaround, but the best way would be to create a new operator.
How does this solution work?
You have a trigger source, which will provide values, when to start start a new interval. The source is switchMapped with an interval as inner-stream. The inner-stream takes an input value for the upstream source for setting the new interval time.
When the source emits a time (Long), the switchMap lambda is invoked and the returned Flowable will be subscribed to immediately. When a new value arrives at the switchMap, the inner subscribed Flowable interval will be unsubscribed from and the lambda will be invoked once again. The returned Inverval-Flowable will be re-subscribed.
This means, that on each emit from the source, a new Inveral is created.
How does it behave?
When the inveral is subscribed to and is about to emit a new value and a new value is emitted from the source, the inner-stream (inverval) is unsubscribed from. Therefore the value is not emitted anymore. The new Interval-Flowable is subscribed to and will emit a value to it's configuration.
lateinit var scheduler: TestScheduler
fun init() {
scheduler = TestScheduler()
fun `62232235`() {
val trigger = PublishSubject.create<Long>()
val switchMap = trigger.toFlowable(BackpressureStrategy.LATEST)
// make sure, that a value is emitted from upstream, in order to make sure, that at least one interval emits values, when the upstream-sources does not provide a seed value.
.switchMap {
Flowable.interval(it, TimeUnit.SECONDS, scheduler)
.map { tick: Long? ->
val test = switchMap.test()
scheduler.advanceTimeBy(10, TimeUnit.SECONDS)
test.assertValues(0, 1, 2)
// send new onNext value at absolute time 10
// the inner stream is unsubscribed and a new stream with inverval(10) is subscribed to. Therefore the first vale will be emitted at 20 (current: 10 + 10 configured)
scheduler.advanceTimeTo(21, TimeUnit.SECONDS)
// if the switch did not happen, there would be 7 values
test.assertValues(0, 1, 2, 0)
My requirement is to group list of messages which comes into my kafka topic having similar key to be grouped and this grouping has to happen with the window time of 5 secs. So my application should return the grouped elements after every 5secs. Below is the application which i have written. But the problem with the below code is its not returning the group of events after 5 secs its returning after 5 secs but its returning very late like 15 secs or 30 secs or more randomly.
KStream<String, String> source = builder
.stream(sourceTopic, Consumed.with(Serdes.String(), Serdes.String()))
.filter((key, value) -> Objects.nonNull(value));
final KTable<Windowed<String>, List<String>> aggTable = source
.groupByKey(Serialized.with(Serdes.String(), new JsonSerde<>(String.class, objectMapper)))
.aggregate(List<String>::new, (key, value, aggregater) -> {
return aggregater;
Materialized.<String, List<String>, WindowStore<Bytes, byte[]>>as("stateStore")
Can you please let us know do we need to do any extra coding to make stream return immediately after the specified timeout.
My flink application does the following
source: read data in form of records from Kafka
split: based on certain criteria
window : timewindow of 10seconds to aggregate into one bulkrecord
sink: dump these bulkrecords to elasticsearch
I am facing issue where flink consumer is not able to hold data for 10seconds, and is throwing the following exception:
Caused by: java.util.concurrent.ExecutionException: Size of the state is larger than the maximum permitted memory-backed state. Size=18340663 , maxSize=5242880
I cannot apply countWindow, because if the frequency of records is too slow, then the elasticsearch sink might be deferred for a long time.
My question:
Is it possible to apply a OR function of TimeWindow and CountWindow, which goes as
> if ( recordCount is 500 OR 10 seconds have elapsed)
> then dump data to flink
Not directly. But you can use a GlobalWindow with a custom triggering logic. Take a look at the source for the count trigger here.
Your triggering logic will look something like this.
private final ReducingStateDescriptor<Long> stateDesc =
new ReducingStateDescriptor<>("count", new Sum(), LongSerializer.INSTANCE);
private long triggerTimestamp = 0;
public TriggerResult onElement(String element, long l, GlobalWindow globalWindow, TriggerContext triggerContext) throws Exception {
ReducingState<Long> count = triggerContext.getPartitionedState(stateDesc);
// Increment window counter by one, when an element is received
// Start the timer when the first packet is received
if (count.get() == 1) {
triggerTimestamp = triggerContext.getCurrentProcessingTime() + 10000; // trigger at 10 seconds from reception of first event
triggerContext.registerProcessingTimeTimer(triggerTimestamp); // Override the onProcessingTime method to trigger the window at this time
// Or trigger the window when the number of packets in the window reaches 500
if (count.get() >= 500) {
// Delete the timer, clear the count and fire the window
return TriggerResult.FIRE;
return TriggerResult.CONTINUE;
You could also use the RocksDB state backend, but a custom Trigger will perform better.
What I'd like to do is this:
Consume records from a numbers topic (Long's)
Aggregate (count) the values for each 5 sec window
Send the FINAL aggregation result to another topic
My code looks like this:
KStream<String, Long> longs =
Serdes.String(), Serdes.Long(), "longs");
// In one ktable, count by key, on a five second tumbling window.
KTable<Windowed<String>, Long> longCounts =
longs.countByKey(TimeWindows.of("longCounts", 5000L));
// Finally, sink to the long-avgs topic.
longCounts.toStream((wk, v) -> wk.key())
It looks like everything works as expected, but the aggregations are sent to the destination topic for each incoming record. My question is how can I send only the final aggregation result of each window?
In Kafka Streams there is no such thing as a "final aggregation". Windows are kept open all the time to handle out-of-order records that arrive after the window end-time passed. However, windows are not kept forever. They get discarded once their retention time expires. There is no special action as to when a window gets discarded.
See Confluent documentation for more details:
Thus, for each update to an aggregation, a result record is produced (because Kafka Streams also update the aggregation result on out-of-order records). Your "final result" would be the latest result record (before a window gets discarded). Depending on your use case, manual de-duplication would be a way to resolve the issue (using lower lever API, transform() or process())
This blog post might help, too:
Another blog post addressing this issue without using punctuations:
With KIP-328, a KTable#suppress() operator is added, that will allow to suppress consecutive updates in a strict manner and to emit a single result record per window; the tradeoff is an increase latency.
From Kafka Streams version 2.1, you can achieve this using suppress.
There is an example from the mentioned apache Kafka Streams documentation that sends an alert when a user has less than three events in an hour:
KGroupedStream<UserId, Event> grouped = ...;
.filter((windowedUserId, count) -> count < 3)
.foreach((windowedUserId, count) -> sendAlert(windowedUserId.window(), windowedUserId.key(), count));
As mentioned in the update of this answer, you should be aware of the tradeoff. Moreover, note that suppress() is based on event-time.
I faced the issue, but I solve this problem to add grace(0) after the fixed window and using Suppressed API
public void process(KStream<SensorKeyDTO, SensorDataDTO> stream) {
.to(outputTopic, Produced.with(String(), new SensorAggregateMetricsSerde()));
private KStream<String, SensorAggregateMetricsDTO> buildAggregateMetricsBySensor(KStream<SensorKeyDTO, SensorDataDTO> stream) {
return stream
.map((key, val) -> new KeyValue<>(val.getId(), val))
.groupByKey(Grouped.with(String(), new SensorDataSerde()))
(String k, SensorDataDTO v, SensorAggregateMetricsDTO va) -> aggregateData(v, va),
.map((key, value) -> KeyValue.pair(key.key(), value));
private Materialized<String, SensorAggregateMetricsDTO, WindowStore<Bytes, byte[]>> buildWindowPersistentStore() {
return Materialized
.<String, SensorAggregateMetricsDTO, WindowStore<Bytes, byte[]>>as(WINDOW_STORE_NAME)
.withValueSerde(new SensorAggregateMetricsSerde());
Here you can see the result