Apache Flink and Kafka Streams have the concept of a session window.
The window is defined based on the time between two consecutive messages from the same key.
If the time between two consecutive messages is less than the specified session gap, then the messages are considered to belong to the same session.
If the gap is larger than the session gap, the window is emitted and a new window is started.
It is trivial to configure a session window in both Flink and KafkaStreams:
.window(EventTimeSessionWindows.withGap(Time.minutes(10)))
.windowedBy(SessionWindows.with(Duration.ofMinutes(5)).grace(Duration.ofSeconds(30)))
I tried to do the same thing with Reactor, but I cannot find a way to do it, probably my knowledge of Reactor is too limited.
I see that Reactor has multiple variations of the window operation, like windowWhile, windowUntil, windowUntilChanged.
But the predicates that they take as arguments evaluate only the current key, not the gap to the previous key.
Thanks!
I can answer my own question, after reading through the Reactor docs and searching for the relevant operator.
The session window functionality can be achieved using the 2 operators bufferUntilChanged or windowUntilChanged with the following signatures:
<V> Flux<List<T>> bufferUntilChanged(Function<? super T,? extends V> keySelector, BiPredicate<? super V,? super V> keyComparator)
<V> Flux<Flux<T>> windowUntilChanged(Function<? super T,? extends V> keySelector, BiPredicate<? super V,? super V> keyComparator)
The first Function argument selects the key that should be compared.
In case of a session window, this key should be the event time stored in the event.
The second BiPredicate argument compares the current key with the previous key.
In case of a session window, I can subtract the previous event time from the current event time and if the difference is larger than a certain interval, I can return either true and add the current item to the ongoing buffer/window or return false and emit the ongoing window and start a new one.
Related
We are seeing some weird behaviour with a processWindow function emitting two records,
the first record contains complete information using aggregated data present in the window and the second record contains partial information with some information removed from the record.
The processWindow function is using state(MapState) as follows:
override def open(parameters: Configuration): Unit = {
cfState = getRuntimeContext.getMapState(
new MapStateDescriptor[(String, Int), mutable.Map[Int, mutable.Set[Int]]] (
"customFieldsState",
classOf[(String, Int)],
classOf[mutable.Map[Int, mutable.Set[Int]]]
)
)
}
and the process function manipulates the above state using records present in the window.
Is this an anti-pattern? Using state within a processWindow function? Are there any other recommendations to using state within a processWindow function?
We need to maintain state in this case as we don't capture all fields in a single window and we need to aggregate the records per user, hence the use of a window function.
Thanks
If you want to maintain state beyond the lifetime of a single window instance, you should use
KeyedStateStore ProcessWindowFunction.Context#globalState
All other state is cleared when the window is closed.
Since globalState is never cleared by Flink, you should set state TTL on the state descriptor you use if you will have keys that go stale, in order to avoid leaking state over time.
The Streams DSL documentation includes a caveat about using the aggregate method to transform a KGroupedTable → KTable, as follows (emphasis mine):
When subsequent non-null values are received for a key (e.g., UPDATE), then (1) the subtractor is called with the old value as stored in the table and (2) the adder is called with the new value of the input record that was just received. The order of execution for the subtractor and adder is not defined.
My interpretation of that last line implies that one of three things can happen:
subtractor can be called before adder
adder can be called before subtractor
adder and subtractor could be called at the same time
Here is the question I'm looking to get answered:
Are all 3 scenarios above actually possible when using the aggregate method on a KGroupedTable?
Or am I misinterpreting the documentation? For my use-case (detailed below), it would be ideal if the subtractor was always be called before the adder.
Why is this question important?
If the adder and subtractor are non-commutative operations and the order in which they are executed can vary, you can end up with different results depending on the order of execution of adder and subtractor. An example of a useful non-commutative operation would be something like if we’re aggregating records into a Set:
.aggregate[Set[Animal]](Set.empty)(
adder = (zooKey, animalValue, setOfAnimals) => setOfAnimals + animalValue,
subtractor = (zooKey, animalValue, setOfAnimals) => setOfAnimals - animalValue
)
In this example, for duplicated events, if the adder is called before the subtractor you would end up removing the value entirely from the set (which would be problematic for most use-cases I imagine).
Why am I doubting the documentation (assuming my interpretation of it is correct)?
Seems like an unusual design choice
When I've run unit tests (using TopologyTestDriver and
EmbeddedKafka), I always see the subtractor is called before the
adder. Unfortunately, if there is some kind of race condition
involved, it's entirely possible that I would never hit the other
scenarios.
I did try looking into the kafka-streams codebase as well. The KTableProcessorSupplier that calls the user-supplied adder/subtracter functions appears to be this one: https://github.com/apache/kafka/blob/18547633697a29b690a8fb0c24e2f0289ecf8eeb/streams/src/main/java/org/apache/kafka/streams/kstream/internals/KTableAggregate.java#L81 and on line 92, you can even see a comment saying "first try to remove the old value". Seems like this would answer my question definitively right? Unfortunately, in my own testing, what I saw was that the process function itself is called twice; first with a Change<V> value that includes only the old value and then the process function is called again with a Change<V> value that includes only the new value. Unfortunately, I haven't been able to dig deep enough to find the internal code that is generating the old value record and the new value record (upon receiving an update) to determine if it actually produces those records in that order.
The order is hard-coded (ie, no race condition), but there is no guarantee that the order won't change in future releases without notice (ie, it's not a public contract and no KIP is needed to change it). I guess there would be a Jira about it... But as a matter of fact, it does not really matter (detail below).
For the three scenarios you mentioned, the 3rd one cannot happen though: Aggregators are execute in a single thread (per shard) and thus either the adder or subtractor is called first.
first with a Change value that includes only
the old value and then the process function is called again with a Change
value that includes only the new value.
In general, both records might be processed by different threads and thus it's not possible to send only one record. It's just that the TTD simulates a single threaded execution thus both records always end up in the same processor.
Cf TopologyTestDriver sending incorrect message on KTable aggregations
However, the order actually only matters if both records really end up in the same processor (if the grouping key did not change during the upstream update).
Furthermore, the order actually depends not on the downstream aggregate implementation, but on the order of writes into the repartitions topic of the groupBy() and with multiple parallel upstream processor, those writes are interleaved anyway. Thus, in general, you should think of the "add" and "subtract" part as independent entities and not make any assumption about their order (also, even if the key did not change, both records might be interleaved by other records...)
The only guarantee provided is (given that you configured the producer correctly to avoid re-ordering during send()), that if the grouping key does not change, the send of the old and new value will not be re-ordered relative to each other. The order of the send is hard-coded in the upstream processor though:
https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/kstream/internals/KTableRepartitionMap.java#L93-L99
Thus, the order of the downstream aggregate processor is actually meaningless.
The Kafka Streams 2.2.0 documentation for the WindowStore and ReadOnlyWindowStore method fetch(K key, Instant from, Instant to) states:
For each key, the iterator guarantees ordering of windows, starting
from the oldest/earliest available window to the newest/latest window.
None of the other fetch methods state this (except the deprecated fetch(K key, long from, long to)), but do they offer the same guarantee?
Additionally, is there any guarantee on ordering of records within a given window? Or is that up to the underlying hashing collection (I assume) implementation and handling of possible hash collisions?
I should also note that we built the WindowStore with retainDuplicates() set to true. So a single key would have multiple entries within a window. Unless we're using it wrong; which I guess would be a different question...
The other methods don't have ordering guarantees, because the order depends on the byte-order of the serialized keys. It's hard to reason about this ordering for Kafka Streams, because the serializers are provided by the user.
I should also note that we built the WindowStore with retainDuplicates() set to true. So a single key would have multiple entries within a window. Unless we're using it wrong; which I guess would be a different question...
You are using it wrong :) -- you can store different keys for the same window by default. If you enable retainDuplicates() you can store the same key multiple times for the same window.
This is the context:
There is an input event stream,
There are some methods to apply on
the stream, which applies different logic to evaluates each event,
saying it is a "good" or "bad" event.
An event can be a real "good" one only if it passes all the methods, otherwise it is a "bad" event.
There is an output event stream who has result of event and its eventID.
To solve this problem, I have two ideas:
We can apply each method sequentially to each event. But this is a kind of batch processing, and doesn't apply the advantages of stream processing, in the same time, it takes Time(M(ethod)1) + Time(M2) + Time(M3) + ....., which maybe not suitable to real-time processing.
We can pass the input stream to each method, and then we can run each method in parallel, each method saves the bad event into a permanent storage, then the Main method could query the permanent storage to get the result of each event. But this has some problems to solve:
how to execute methods in parallel in the programming language(e.g. Scala), how about the performance(network, CPUs, memory)
how to solve the synchronization problem? It's sure that those methods need sometime to calculate and save flag into the permanent storage, but the Main just need less time to query the flag, which a delay issue occurs.
etc.
This is not a kind of tech and design question, I would like to ask your guys' ideas, if you have some new ideas or ideas to solve the problem ? Looking forward to your opinions.
Parallel streams, each doing the full set of evaluations sequentially, is the more straightforward solution. But if that introduces too much latency, then you can fan out the evaluations to be done in parallel, and then bring the results back together again to make a decision.
To do the fan-out, look at the split operation on DataStream, or use side outputs. But before doing this n-way fan-out, make sure that each event has a unique ID. If necessary, add a field containing a random number to each event to use as the unique ID. Later we will use this unique ID as a key to gather back together all of the partial results for each event.
Once the event stream is split, each copy of the stream can use a MapFunction to compute one of evaluation methods.
Gathering all of these separate evaluations of a given event back together is a bit more complex. One reasonable approach here is to union all of the result streams together, and then key the unioned stream by the unique ID described above. This will bring together all of the individual results for each event. Then you can use a RichFlatMapFunction (using Flink's keyed, managed state) to gather the results for the separate evaluations in one place. Once the full set of evaluations for a given event has arrived at this stateful flatmap operator, it can compute and emit the final result.
I have a simple test code for Akka Streams (written in F# but Scala version isn't match different):
var source = Source.From(Enumerable.Range(1, 3));
var flow = Flow.FromFunction(new Func<int, string>(x => (x * 2).ToString()));
var sink = Sink.ForEach<string>(output.Add);
var runnable = source.Via(flow).To(sink);
Since Via helper method is just a shortcut for ViaMaterialized(flow, Keep.Left) I can rewrite the code like this:
var source = Source.From(Enumerable.Range(1, 3));
var flow = Flow.FromFunction(new Func<int, string>(x => (x * 2).ToString()));
var sink = Sink.ForEach<string>(output.Add);
var runnable = source.ViaMaterialized(flow, Keep.Left).To(sink);
Keep property (Left, Right, Both or None) tells the stream materializer that is should preserve the value on a specified side of the stream operation. But I notice that if I change Keep.Left to Keep.Right, Keep.Both or event Keep.None, that doesn't change anything in the execution outcome: the sink will always receive the output according to the flow transformation function.
I thought that using non-None Keep value for Flow stages in a stream graph is necessary to ensure the values gets sent to the sink. I must have misunderstood the meaning of this, so my question is why a stream flow works even when materialization is disabled for both sides? And can you give an example when changing Keep values between Left, Right, Both and None affects the values that reach the sink?
You are confusing the fact that a stream gets materialized and the fact that it has a materialized value.
A flow (or more generally a graph) is a blueprint for a stream. When you use the run() method on a runnable graph, a stream is materialized using this blueprint. This stream does whatever is expected of it without any regards for materialized values.
What is a materialized value? When you use the method run(), a value is returned. That's the materialized value for your stream. Most of the time (for simple built-in stages), the materialized value is unimportant (it's called NotUsed in scala, I don't know about .NET). A non-trivial example is the Sink.ignore that is materialized as a Future[Done]. It gives you a handle on when the particular stream you have materialized will have completely consumed its input (or thrown an error). More generally, the materialized value gives you some circumstantial information on what's going in your stream (sorry about the vagueness of this statement, but the principle at hand is too general for me to be more explicit).
When building a graph, you put together different pieces that all have a different materialized value. Since you can only have one for your runnable graph, you need to combine them in some way. Keep.{right, left, both, none} are simple functions that combine those values by keeping only one of the values, or both, or none. However, it does not change the fact that both graphs will be materialized, and the values generated, even if you decide not to keep them.
Keep.* functions don't influence the materialization process itself, only what you get out of it.
More specifically, at materialization time (i.e. when run() is called), each and every stage of your stream (in your example, source, flow and sink) will always be materialized - and therefore produce a materialized value under the hood. You can clearly see what that value will be from their last type parameter.
For the user's convenience, as most likely you will not be interested in all of them, you can use Keep.* accordingly to select what to keep around. This directly reflects on the return type of run().