Pair Rx Sequences with one sequence as the master who controls when a new output is published - system.reactive

I'd like to pair two sequences D and A with Reactive Extensions in .NET. The resulting sequence R should pair D and A in a way that whenever new data appears on D, it is paired with the latest value from A as visualized in the following diagram:
D-1--2---3---4---
A---a------b-----
R----2---3---4---
a a b
CombineLatest or Zip does not exactly what I want. Any ideas on how this can be achieved?
Thanks!

You want Observable.MostRecent:
var R = A.Publish(_A => D.SkipUntil(_A).Zip(_A.MostRecent(default(char)), Tuple.Create));
Replace char with whatever the element type of your A observable.
Conceptually, the query above is the same as the following query.
var R = D.SkipUntil(A).Zip(A.MostRecent(default(char)), Tuple.Create));
The problem with this query is that subscribing to R subscribes to A twice. This is undesirable behavior. In the first (better) query above, Publish is used to avoid subscribing to A twice. It takes a mock of A, called _A, that you can subscribe to many times in the lambda passed to Publish, while only subscribing to the real observable A once.

Related

KStream -- iterate LIST of values and send it to output topic

I have a below scenario
KStream<String List<Foo> > fooList = returns me this KStream;
now I have to send this to outputTopic one by one rather than a whole list
(i.e) fooList iterate the values alone and push them to the destination topic, so that Individual messages are pushed
I believe this will work:
fooList.flatMapValues(v -> v).to("output");
You can read more about flatMapValues here: https://kafka.apache.org/31/javadoc/org/apache/kafka/streams/kstream/KStream.html#flatMapValues(org.apache.kafka.streams.kstream.ValueMapperWithKey).
From a mathematical point of view, you asking how to flatten the KStream of a List<Foo>. Since flatten is the same as the flatMap of the identify function, this ought to be what you want.

Kafka Streams - adding message frequency in enriched stream

From a stream (k,v), I want to calculate a stream (k, (v,f)) where f is the frequency of the occurrences of a given key in the last n seconds.
Give a topic (t1) if I use a windowed table to calculate the frequency:
KTable<Windowed<Integer>,Long> t1_velocity_table = t1_stream.groupByKey().windowedBy(TimeWindows.of(n*1000)).count();
This will give a windowed table with the frequency of each key.
Assuming I won’t be able to join with a Windowed key, instead of the table above I am mapping the stream to a table with simple key:
t1_Stream.groupByKey()
.windowedBy(TimeWindows.of( n*1000)).count()
.toStream().map((k,v)->new KeyValue<>(k.key(), Math.toIntExact(v))).to(frequency_topic);
KTable<Integer,Integer> t1_frequency_table = builder.table(frequency_topic);
If I now lookup in this table when a new key arrives in my stream, how do I know if this lookup table will be updated first or the join will occur first (which will cause the stale frequency to be added in the record rather that the current updated one). Will it be better to create a stream instead of table and then do a windowed join ?
I want to lookup the table with something like this:
KStream<Integer,Tuple<Integer,Integer>> t1_enriched = t1_Stream.join(t1_frequency_table, (l,r) -> new Tuple<>(l, r));
So instead of having just a stream of (k,v) I have a stream of (k,(v,f)) where f is the frequency of key k in the last n seconds.
Any thoughts on what would be the right way to achieve this ? Thanks.
For the particular program you shared, the stream side record will be processed first. The reason is, that you pipe the data through a topic...
When the record is processed, it will update the aggregation result that will emit an update record that is written to the through-topic. Directly afterwards, the record will be processed by the join operator. Only afterwards a new poll() call will eventually read the aggregation result from the through-topic and update the table side of the join.
Using the DSL, it seems not to be possible for achieve what you want. However, you can write a custom Transformer that re-implements the stream-table join that provides the semantics you need.

Simple sequence of events

Assume events of either type A, B, C or D are being emitted. I want to detect whenever an event of type A is followed by an event of type B. In other words, I want to detect a sequences, for which Esper's EPL provides the -> operator.
However, what I described above is ambiguous, what I want is the following: Whenever a B is detected, I want it to be matched with the most recent A.
I have been playing around with EPL's syntax, but the best I could come up with was that:
select * from pattern [(every a=A) -> b=B]
This, however, matches each B with the oldest A that occured after the last match. Weird...
Help is much appreciated! :P
I use joins a lot for the simple matching. The other option is match-recognize. The join like this.
select * from B unidirectional, A.std:lastevent()

Use Spring Batch to write in different Data Sources

For a project I need to process items from one table and generate 3 different items for 3 different tables, all 3 in a second data source different from the one of the first item. The implementation is done with Spring Batch over Oracle DB. I think this question has something similar to what I need, but in there it is writing at the end only one different item.
To ilustrate the situation:
DataSource 1 DataSource 2
------------ ------------------------------
Table A Table B Table C Table D
The reader should read one item from table A. In the processor, using the information from the item in A, 3 new items will be created of type B, C and D. In addition, the item from table A will be updated.
The writer should be able to write at the same time all 4 items. My first implementation is using a JpaItemWriter to update the item A, but I don't know how the processor could give the other 3 items to the writer in order to save all at the same time.
Can a processor return several items from different types? Would I need to create 4 steps, each one writing one of the items? And in this case, would that be error safe (If there is an error writing D, then A, B, and C would be rollback)?
Thanks in advance for your support!
Your question is really two questions. Let's look at each individually:
Can an ItemProcessor return multiple items
An ItemProcessor can only return one item at a time for each item that is passed in. Because of this, in your specific scenario, you'll need your ItemProcessor to return a wrapper object that wraps items A, B, C, and D.
How can I write different types in the same step
Spring Batch relies heavily on composition in it's programming model. Since your ItemProcessor will be returning a wrapper object, you'll end up writing an ItemWriter that unwraps items A, B, C, and D and delegates the writing of each to the apropriate writer. So in the final solution, you'll end up with 5 ItemWriters: one for each item type and one that wraps all of those. Take a look at our CompositeItemWriter as an example here: https://github.com/spring-projects/spring-batch/blob/master/spring-batch-infrastructure/src/main/java/org/springframework/batch/item/support/CompositeItemWriter.java

How can I merge two streams with identical columns in Pentaho?

I am new user in Pentaho and maybe my question is very simple. I have two streams with identical columns, e.g. stream S1 has the columns: A, B, C, and stream S2 has columns: A, B, C (same name, same order, same data type). I want to merge or append these two streams into a single stream containing the columns A, B, C. However, when I use merge join (with the option FUL OUTER JOIN) my result is a stream with the columns: A, B, C, A_1, B_1, C_1. It is not what I want. I tried to use the append stream step, but in this case appeared nothing in the preview.
As per your requirement first create two stream.
Here we have taken two streams i.e. "stream1.xls" and "stream2.xls".
Then built the transformation using the "Sorted merge" join
For better understanding please refer the screenshots.