Kafka Stream aggregate function sinking each joined record in an incremental way, instead of a single list of aggregated records - apache-kafka

I’m having some issues with my Kafka Streams implementation in production. I’ve implemented a function that takes a KTable and a KStream, and yields another KTable with aggregated results based on the join of these two inputs. The idea is to iterate a list in the KStream input, and for each iteration, join it with the KTable, and aggregate into a list of events present in the KTable and sink to a topic, containing the original KTable event and the list of joined KStream (1 to N join).
Context
This is how my component interacts with its context. MovementEvent contains a list of transaction_ids that should match the transaction_id of TransactionEvent, where the joiner should match & generate a new event (Sinked Event) with the original MovementEvent and a list of the matched TransactionEvent.
For reference, the Movement topic has 12 million, while the Transaction topic has 21 million records.
Implementation
public class SinkEventProcessor implements BiFunction<
KTable<TransactionKey, Transaction>,
KStream<String, Movement>,
KTable<SinkedEventKey, SinkedEvent>> {
#Override
public KTable<SinkedEventKey, SinkedEvent> apply(final KTable<TransactionKey, Transaction> transactionTable,
final KStream<String, Movement> movementStream) {
return movementStream
// [A]
.flatMap((movementKey, movement) -> movement
.getTransactionIds()
.stream()
.distinct()
.map(transactionId -> new KeyValue<>(
TransactionKey.newBuilder()
.setTransactionId(transactionId)
.build(),
movement))
.toList())
// [B]
.join(transactionTable, (movement, transaction) -> Pair.newBuilder()
.setMovement(movement)
.setTransaction(transaction)
.build())
// [C]
.groupBy((transactionKey, pair) -> SinkedEventKey.newBuilder()
.setMovementId(pair.getMovement().getMovementId())
.build())
// [D]
.aggregate(SinkedEvent::new, (key, pair, collectable) ->
collectable.setMovement(pair.getMovement())
.addTransaction(pair.getTransaction()));
}
}
[A] I have started the implementation by iterating the Movement KStream, extracting the transactionId and creating a TransactionKey to use as the new key for the following operation, to facilitate the join with each transactionId present in the Movement entity. This operation returns a KStream<TransactionKey, Movement>
[B] Joins the formerly transformed KStream and adds each value to an intermediate pair. Returns a `KStream<TransactionKey, Pair>.
[C] Groups the pairs by movementId and constructs the new key (SinkedEventKey) for the sink operation.
[D] Aggregates into the result object (SinkedEvent) by adding the transaction to the list. This operation will also sink to the topic as a KTable<SinkedEventKey, SinkedEvent>
Problem
The problem starts when we start processing the stream, the sink operation of the processor starts generating more records than it should. For instance, for a Movement with 4 transaction_id, the output topic will become something like this:
partition
offset
count of [TransactionEvent]
expected count
0
1
1
4
0
2
2
4
0
3
4
4
0
4
4
4
And the same happens for other records (e.g. a Movement with 13 transaction_id will yield 13 menssages). So for some reason that I can't compreehend, the aggregate operation is sinking on each operation, instead of waiting and collecting into the list and sinking only once.
I've tried to reproduce it in a development cluster, with exactly the same settings, with no avail. Everything seems to be working properly when I try to reproduce it (a Movement with 8 transactions produces only 1 record) but whenever I bring it to production it doesn't work as intended. I'm not sure what I'm missing, any help?

Related

Can I avoid repartition in the below kafka stream

I am trying to create a table from streaming through topic by changing the key but the value remains the same. Is it possible to avoid repartition?
streamsBuilder.stream(
TOPIC,
Consumed.with(IdSerde(), ValueSerde())
)
.peek { key, value -> logger.info("Consumed $TOPIC, key: $key, value: $value") }
.filter { _, value -> value != null }
.selectKey(
{ _, value -> NewKey(value.newKey.toString()) },
Named.`as`("changeKey")
)
.toTable(
Materialized.`as`<NewKey, Value, KeyValueStore<Bytes, ByteArray>>(
NEW_TABLE_NAME
)
.withKeySerde(NewKeySerde())
.withValueSerde(ValueSerde())
)
return streamsBuilder.build()
The only way I can think off is that your producer shoud send the data to the input topic with the appropriate key rather than you selecting the key in the streams application.
Otherwise, the answer is no.
Before going into the explanation, a bit of background on KTable
A KTable is an abstraction of a changelog stream. This means a KTable
holds only the latest value for a given key.
Consider the following four records(in order) being sent to the stream
("alice", 1) --> ("bob", 100) --> ("alice", 3) --> ("bob", 50)
For the above the KTable would look as follows (alice and bob being keys of the message):
("alice", 3)
("bob", 50)
Explanation
Whenever we run multiple instances of the kafka streams application,
the KTable in each instance holds only the local state of the
application. Hence, repartitioning is required to ensure that the
messages with the same key land in the same partition.
To understand this better let's consider an example. We have a topic with two partitions. Partition 0 has a single event and Partition 1 has three events as follows(null represents no key):
Topic Partition 0: (null, {"name": alice,"count": 1}) ,
Topic Partition 1: (null, {"name": alice,"count": 3}) , (null, {"name": "bob","count": 100}), (null, {"name": "bob","count": 50})
We create a kafka streams application to read data from this topic and create a KTable using the name field as key. Also, the streams application is running with two instances. Each instance will be assigned a single partition as shown below:
Topic Partition 0: -----> Instance 1
Topic Partition 1: -----> Instance 2
Since KTables are maintained locally per instance, if no repartitioning is done - KTable will be in an inconsistent state for both the instances.
This is shown below:
Instance 1 KTable : ("alice", {"count":1})
Instance 2 KTable : ("alice", {"count":1}), ("bob", {"count":50})
Hence, to avoid issues like above kafka streams repartitions the topic if a KTable is created after a selectKey operation.

Kafka kstream comparing two values from two different topics

Im currently trying to send two different formats of the same event on two different topics. Lets say format A to topic A and format B to topic B.
And Format B is only sent approx 15% of the times since it's not supported for older stuff. And if B is sent there will be an A equivalent of each event.
What i want to do is listen to them at the same time and if B exists i need to discard A.
What i've tried so far is to listen on both (im using kstreams),
and doing stream - stream join
streamA.leftJoin(streamB, (A_VALUE, B_VALUE) -> {
if (B_VALUE != null && A_VALUE != null) {
return B_VALUE
} else if (A_VALUE != null && B_VALUE == null) {
return A_VALUE
}
return null;
},
JoinWindows.of(Duration.ofMinutes(5)).grace(Duration.ofMinutes(15)),
Joined.with(
Serdes.String(),
Serdes.String(),
Serdes.String()
))
And running tests on a load between 50-200 events/s i've seen results as such:
number B_VALUEs sent is always correct,
but number A_VALUE is larger than expected.
I think that sometimes it's sending both A and B.
I've tried using guava cache as a "hashmap with TTL", Storing all the B Events and then comparing that way. Here i find that the total amount is always correct but there is less B events then expected meaning that sometimes it does not find a match.
If there is a better way of doing it without using databases then im open for it!
note: unique events correlated always have the same key e.g. <432, A_VALUE> , <432, B_VALUE>.

TopologyTestDriver sending incorrect message on KTable aggregations

I have a topology that aggregates on a KTable.
This is a generic method I created to build this topology on different topics I have.
public static <A, B, C> KTable<C, Set<B>> groupTable(KTable<A, B> table, Function<B, C> getKeyFunction,
Serde<C> keySerde, Serde<B> valueSerde, Serde<Set<B>> aggregatedSerde) {
return table
.groupBy((key, value) -> KeyValue.pair(getKeyFunction.apply(value), value),
Serialized.with(keySerde, valueSerde))
.aggregate(() -> new HashSet<>(), (key, newValue, agg) -> {
agg.remove(newValue);
agg.add(newValue);
return agg;
}, (key, oldValue, agg) -> {
agg.remove(oldValue);
return agg;
}, Materialized.with(keySerde, aggregatedSerde));
}
This works pretty well when using Kafka, but not when testing via `TopologyTestDriver`.
In both scenarios, when I get an update, the subtractor is called first, and then the adder is called. The problem is that when using the TopologyTestDriver, two messages are sent out for updates: one after the subtractor call, and another one after the adder call. Not to mention that the message that is sent after the subrtractor and before the adder is in an incorrect stage.
Any one else could confirm this is a bug? I've tested this for both Kafka versions 2.0.1 and 2.1.0.
EDIT:
I created a testcase in github to illustrate the issue: https://github.com/mulho/topology-testcase
It is expected behavior that there are two output records (one "minus" record, and one "plus" record). It's a little tricky to understand how it works, so let me try to explain.
Assume you have the following input table:
key | value
-----+---------
A | <10,2>
B | <10,3>
C | <11,4>
On KTable#groupBy() you extract the first part of the value as new key (ie, 10 or 11) and later sum the second part (ie, 2, 3, 4) in the aggregation. Because A and B record both have 10 as new key, you would sum 2+3 and you would also sum 4 for new key 11. The result table would be:
key | value
-----+---------
10 | 5
11 | 4
Now assume that an update record <B,<11,5>> change the original input KTable to:
key | value
-----+---------
A | <10,2>
B | <11,5>
C | <11,4>
Thus, the new result table should sum up 5+4 for 11 and 2 for 10:
key | value
-----+---------
10 | 2
11 | 9
If you compare the first result table with the second, you might notice that both rows got update. The old B|<10,3> record is subtracted from 10|5 resulting in 10|2 and the new B|<11,5> record is added to 11|4 resulting in 11|9.
This is exactly the two output records you see. The first output record (after subtract is executed), updates the first row (it subtracts the old value that is not part of the aggregation result any longer), while the second record adds the new value to the aggregation result. In our example, the subtract record would be <10,<null,<10,3>>> and the add record would be <11,<<11,5>,null>> (the format of those record is <key, <plus,minus>> (note that the subtract record only set the minus part while the add record only set the plus part).
Final remark: it is not possible to put plus and minus records together, because the key of the plus and minus record can be different (in our example 11 and 10), and thus might go into different partitions. This implies that the plus and minus operation might be executed by different machines and thus it's not possible to only emit one record that contains both plus and minus part.

Flink: Table API copy operators in execution plan

I use Flink 1.2.0 Table API for processing some streaming data. Following is my code:
val dataTable = myDataStream
// table A
val tableA = dataTable
.window(Tumble over 5.minutes on 'rowtime as 'w)
.groupBy("w, group1, group2")
.select("w.start as time, group1, group2, data1.sum as data1, data2.sum as data2")
tableEnv.registerTable("tableA", tableA)
// table A sink
tableA.writeToSink(sinkTableA)
//...
// I shoul get some other outputs from TableA output
//...
val dataTable = tableEnv.ingest("tableA")
// table result1
val result1 = dataTable
.window(Tumble over 5.minutes on 'rowtime as 'w)
.groupBy("w, group1")
.select("w.start as time, group1, data1.sum as data1")
// result1 sink
result2.writeToSink(sinkResult1)
// table result2
val result2 = dataTable
.window(Tumble over 5.minutes on 'rowtime as 'w)
.groupBy("w, group2")
.select("w.start as time, group2, data2.sum as data1")
// result2 sink
result2.writeToSink(sinkResult2)
I wait to get this tree in the flink execution plan.
Same as I have for Flink Streaming in my other Flink jobs.
DataStream_Operators -> TableA_Operators -> TableA_Sink
|-> Result1_Operators -> Result1_Sink
|-> Result2_Operators -> Result2_Sink
But, I get this with 3 copies of same opertoprs for TableA !
DataStream_Operators -> TableA_Operators -> TableA_Sink
|-> Copy_of_TableA_Operators -> Result1_Operators -> Result1_Sink
|-> Copy_of_TableA_Operators -> Result2_Operators -> Result2_Sink
I have bad performance with big input data for this job in result.
How I can fix this and get optimal execution plan ?
I undestand, what the Flink Table API and SQL are experimental features and
maybe it's will fixed in next versions.
At the current state, the Table API translates the whole query whenever you convert a Table into a DataSet or DataStream or write it to a TableSink. In your program, you call three times writeToSink which means that the each time the complete query is translated.
But what is the complete query? There are all Table API operators that have been applied on a Table. When you register a Table in the TableEnvironment it is basically registered as a view, i.e., only its definition (all the operators that define the Table) are registered. Therefore, these operators are translated again when you call writeToSink the second and third time.
You can solve this issue if you translate tableA into a DataStream and register the DataStream in the TableEnvironment instead for registering it as a Table. This would look as follows:
val tableA = ...
val streamA = tableA.toDataStream[X] // X should be a case class for rows of tableA
val tableEnv.registerDataStream("tableA", streamA)
tableEnv.ingest("tableA").writeToSink(sinkTableA) // emit tableA by ingesting the registered DataStream
I know, this is not very convenient but at the moment the only way to avoid repeated translation of a Table.

How to manage Kafka KStream to Kstream windowed join?

Based on apache Kafka docs KStream-to-KStream Joins are always windowed joins, my question is how can I control the size of the window? Is it the same size for keeping the data on the topic? Or for example, we can keep data for 1 month but join the stream just for past week?
Is there any good example to show a windowed KStream-to-kStream windowed join?
In my case let's say I have 2 KStream, kstream1 and kstream2 I want to be able to join 10 days of kstream1 to 30 days of kstream2.
That is absolutely possible. When you define you Stream operator, you specify the join window size explicitly.
KStream stream1 = ...;
KStream stream2 = ...;
long joinWindowSizeMs = 5L * 60L * 1000L; // 5 minutes
long windowRetentionTimeMs = 30L * 24L * 60L * 60L * 1000L; // 30 days
stream1.leftJoin(stream2,
... // add ValueJoiner
JoinWindows.of(joinWindowSizeMs)
);
// or if you want to use retention time
stream1.leftJoin(stream2,
... // add ValueJoiner
(JoinWindows)JoinWindows.of(joinWindowSizeMs)
.until(windowRetentionTimeMs)
);
See http://docs.confluent.io/current/streams/developer-guide.html#joining-streams for more details.
The sliding window basically defines an additional join predicate. In SQL-like syntax this would be something like:
SELECT * FROM stream1, stream2
WHERE
stream1.key = stream2.key
AND
stream1.ts - before <= stream2.ts
AND
stream2.ts <= stream1.ts + after
where before == after == joinWindowSizeMs in this example. before and after can also have different values if you use JoinWindows#before() and JoinWindows#after() to set those values explicitly.
The retention time of source topics, is completely independent of the specified windowRetentionTimeMs that is applied to an changelog topic created by Kafka Streams itself. Window retention allows to join out-of-order records with each other, i.e., record that arrive late (keep in mind, that Kafka has an offset based ordering guarantee, but with regard to timestamps, record can be out-of-order).
In addition to what Matthias J. Sax said, there is a stream-to-stream (windowed) join example at:
https://github.com/confluentinc/examples/blob/3.1.x/kafka-streams/src/test/java/io/confluent/examples/streams/StreamToStreamJoinIntegrationTest.java
This is for Confluent 3.1.x with Apache Kafka 0.10.1, i.e. the latest versions as of January 2017. See the master branch in the repository above for code examples that use newer versions.
Here's the key part of the code example above (again, for Kafka 0.10.1), slightly adapted to your question. Note that this example happens to demonstrate an OUTER JOIN.
long joinWindowSizeMs = TimeUnit.MINUTES.toMillis(5);
long windowRetentionTimeMs = TimeUnit.DAYS.toMillis(30);
final Serde<String> stringSerde = Serdes.String();
KStreamBuilder builder = new KStreamBuilder();
KStream<String, String> alerts = builder.stream(stringSerde, stringSerde, "adImpressionsTopic");
KStream<String, String> incidents = builder.stream(stringSerde, stringSerde, "adClicksTopic");
KStream<String, String> impressionsAndClicks = alerts.outerJoin(incidents,
(impressionValue, clickValue) -> impressionValue + "/" + clickValue,
// KStream-KStream joins are always windowed joins, hence we must provide a join window.
JoinWindows.of(joinWindowSizeMs).until(windowRetentionTimeMs),
stringSerde, stringSerde, stringSerde);
// Write the results to the output topic.
impressionsAndClicks.to(stringSerde, stringSerde, "outputTopic");