Can a changelog topic be reused? - apache-kafka

I intend to share data aggregated by one stream with another one to reduce (re)processing time when restarting services or rebuilding aggregates.
This stream creates a store which the changelog topic belongs to:
#Bean
fun wirtschaftseinheiten() = Consumer<KStream<String, WirtschaftseinheitAggregat>> {
it.toTable(Materialized.`as`(wirtschaftseinheitTableStoreSupplier))
}
And this is how I join the changelog topic:
fun KStream<String, ProjektEvent>.leftJoin(wirtschaftseinheiten: KTable<String, WirtschaftseinheitAggregat>): KTable<String, ProjektAggregat> =
mapValues { _, v -> ProjektAggregat(projekt = v, projektErstelltAm = v.metaInfo.createdAt) }
.groupByKey()
// take the earliest date which should be from event with ACTION = CREATE_REQUEST
.reduce { prev, next -> if (next.projektErstelltAm?.isAfter(prev.projektErstelltAm) == true) next.copy(projektErstelltAm = prev.projektErstelltAm) else next }
.toStream()
.toTable(Materialized.`as`(preliminaryProjektStoreSupplier))
.leftJoin(
wirtschaftseinheiten,
{ projektAggregat -> projektAggregat.projekt?.projekt?.technischerPlatz?.take(7) },
{ projektAggregat, wirtschaftseinheit ->
if (wirtschaftseinheit != null) {
projektAggregat + wirtschaftseinheit
} else {
logger().error("No wirtschaftseinheit found for $projektAggregat")
projektAggregat
}
},
Materialized.`as`(projektWirtschaftseinheitJoinStoreSupplier)
)
but unfortunately no match will be found as the right side is always null.
If I directly join the topic, then it of course works, but du to migrations I have to rebuild topics which also means consuming the topic declared in wirtschaftseinheitTableStoreSupplier and this is time-consuming.
So therefore a general question: is this a feasible way? If not, is there a better one?

Switching from KTable to GlobalKTable solved the issue
fun KStream<String, ProjektEvent>.leftJoin(wirtschaftseinheiten: GlobalKTable<String, WirtschaftseinheitAggregat>): KTable<String, ProjektAggregat> =
mapValues { _, v -> ProjektAggregat(projekt = v, projektErstelltAm = v.metaInfo.createdAt) }
.groupByKey()
// take the earliest date which should be from event with ACTION = CREATE_REQUEST
.reduce(
{ current, next ->
if (next.projektErstelltAm?.isAfter(current.projektErstelltAm) == true)
next.copy(projektErstelltAm = current.projektErstelltAm)
else
next
},
Materialized.`as`(preliminaryProjektStoreSupplier)
)
.toStream()
.leftJoin(
wirtschaftseinheiten,
{ _, projektAggregat -> projektAggregat.projekt?.projekt?.technischerPlatz?.take(7) },
{ projektAggregat, wirtschaftseinheit ->
if (wirtschaftseinheit != null) {
projektAggregat + wirtschaftseinheit
} else {
logger().error("No wirtschaftseinheit found for $projektAggregat")
projektAggregat
}
},
)
.toTable(Materialized.`as`(projektWirtschaftseinheitJoinStoreSupplier))

Related

Flutter Stream RxDart combinelatest get data from stream listen

I have the below stream which will combine all user data using RxDart combinelatest function please refer the code below.
Stream<List<User>?> getallUserList() {
return getUsers().map((userList) {
return userList?.map((_userDetails) {
return Rx.combineLatest3(getUserByEmail(userEmail: _userDetails.email!), getKUserByIdEmail(userEmail: _userDetails.email!),
getUsers(userEmail: _userDetails.email!), (User? a, XUser? b, YUser? c) {
_repository
.getSummary(userKey: c!.userKey!)
.listen((summary) {
if (summary != null) {
_userDetails.titles = summary.title!;
}
});
_userDetails.xUser = b;
_userDetails.yUser = c;
return _userDetails;
});
});
}).switchMap((observables) {
return observables != null && observables.isNotEmpty ? Rx.combineLatestList(observables) : Stream.value([]);
});}
In the above code, i am calling another stream getsummary to get user summary title. getsummary stream will combine couple of other streams to get the userTitle. please refer the code below for getsummary stream.
Stream<UserSummary?> getSummary({required String userKey}) {
UserSummary? userSummary;
return Rx.combineLatest4(
Stream1,
Stream2,
Stream3,
Stream4,
(a, b, c, d) {
if (a != null) {
List<userSummaryList>? getUserSummaryList = d;
for (int responseIdx = 0; responseIdx < details!.length; responseIdx++) {
getSummaryList = getSummaryList!.where((list) => list.titleId!.contains(title[responseIdx]!)).toList();
}
userSummary = (getSummaryList!.isNotEmpty ? getSummaryList.first : []) as userSummary?;
return userSummary;
} else {
return userSummary;
}
});}
My issue is when calling getsummary stream from getalluserlist to get the data using listen function. I can't able to access the listen data outside the stream to update the _userDetails.title or update the _userDetails.title outside the stream. how to access the stream value outside the stream listen?

How to avoid nested subscription

I have this method deleteFeedTable() which returns a Completable and when it finishes I want to start another Disposable.
What I did is combine the two using operator concatWith, but this results in a nested subscription and I'd like to avoid that.
disposables.add(
localDataSource.deleteFeedTable()
.doOnComplete(() -> { preferencesManager.setFeedTableUpdateState(false);
})
.concatWith(new Completable() {
#Override
protected void subscribeActual(CompletableObserver s) {
s.onSubscribe(localDataSource.getLastStoredId()
.flatMap(lastStoredId -> remoteDataSource.getFeed(lastStoredId))
.doOnNext(feedItemList -> localDataSource.saveFeed(feedItemList))
.map(feedItemList -> {
Timber.i("MESA STO MAP");
List<Feed> feedList = new ArrayList<>();
for (FeedItem feedItem : feedItemList) {
feedList.add(mapper.from(feedItem));
}
downloadImageUseCase.downloadPhotos(feedList);
return feedList;
})
.subscribe());
}
})
.subscribeOn(schedulerProvider.io())
.observeOn(schedulerProvider.mainThread())
.subscribe(() -> {}, throwable -> Log.i("THROW", "loadData ", throwable)));
Is there a way I can avoid the nested subscription ? Or is there another way to add it to the disposables variable so I can clear the subscription later ?
Use andThen:
disposables.add(
localDataSource.deleteFeedTable()
.doOnComplete(() -> {
preferencesManager.setFeedTableUpdateState(false);
})
.andThen(
localDataSource.getLastStoredId()
.flatMap(lastStoredId -> remoteDataSource.getFeed(lastStoredId))
.doOnNext(feedItemList -> localDataSource.saveFeed(feedItemList))
.map(feedItemList -> {
Timber.i("MESA STO MAP");
List<Feed> feedList = new ArrayList<>();
for (FeedItem feedItem : feedItemList) {
feedList.add(mapper.from(feedItem));
}
downloadImageUseCase.downloadPhotos(feedList);
return feedList;
})
)
.subscribeOn(schedulerProvider.io())
.observeOn(schedulerProvider.mainThread())
.subscribe(() -> {}, throwable -> Log.i("THROW", "loadData ", throwable)));

How to ensure result count and caching with varying parameters

I have an API endpoint that can different result count based on request parameters. Parameters are page, per_page, query and others.
fun getItems(params : Map<String, String>) : Single<ItemsResponse>
data class ItemsResponse(
val hasMore : Boolean,
val items : List<Items>
)
API is not trustworthy and could return less than per_page. I want to ensure, that I always get result count I need and cache remainder for next request cycle.
For example something
val page : Int = 1
fun fetchItems(requestedItems : Int = 20) : Single<List<Items>> {
...
.map { buildParams(page, perPage, query) }
.flatMap { api.getItems(it) }
.doOnSuccess { page++ }
.buffer(requestedItems)
}
fun buildParams(page: Int, perPage: Int, query : String) : Map<String, String> {
...
}
Example scenario:
Caller requests 20 items for the first time.
Call to api.getItems() with page: 1, per_page is always 20.
Call returns 16 items
Call to api.getItems() with page: 2
Call return 19 items
20 items were returned to caller and 15 remaining items were cached for next caller request.
Caller requests 20 items for 2nd time.
Call to api.getItems() with page: 3
Call returns 12 items
20 items were returned to caller (15 older ones and 5 from last response) and 7 remaining items were cached for next caller requests.
And so on and so forth.
This looks like Producer-Consumer pattern, but is doable in RxJava2?
Edit: based on the additional info
Requires: RxJava 2 Extensions library: compile "com.github.akarnokd:rxjava2-extensions:0.17.0"
import hu.akarnokd.rxjava2.expr.StatementObservable
import io.reactivex.Observable
import io.reactivex.functions.BooleanSupplier
import io.reactivex.subjects.PublishSubject
import java.util.concurrent.Callable
import java.util.concurrent.ConcurrentLinkedQueue
import java.util.concurrent.ThreadLocalRandom
var counter = 0;
fun service() : Observable<String> {
return Observable.defer(Callable {
val n = ThreadLocalRandom.current().nextInt(21)
val c = counter++;
Observable.range(1, n).map({ v -> "" + c + " | " + v })
})
}
fun getPage(pageSignal : Observable<Int>, pageSize: Int) : Observable<List<String>> {
return Observable.defer(Callable {
val queue = ConcurrentLinkedQueue<String>()
pageSignal.concatMap({ _ ->
StatementObservable.whileDo(
service()
.toList()
.doOnSuccess({ v -> v.forEach { queue.offer(it) }})
.toObservable()
, BooleanSupplier { queue.size < pageSize })
.ignoreElements()
.andThen(
Observable.range(1, pageSize)
.concatMap({ _ ->
val o = queue.poll();
if (o == null) {
Observable.empty()
} else {
Observable.just(o)
}
})
.toList()
.toObservable()
)
})
})
}
fun main(args: Array<String>) {
val pages = PublishSubject.create<Int>();
getPage(pages, 20)
.subscribe({ println(it) }, { it.printStackTrace() })
pages.onNext(1)
pages.onNext(2)
}

Join an unknown number of sources when all sources contain given key

Given a source provider like below:
IObservable<ISource> Sources();
with each ISource looking like below:
IObservable<IEnumerable<string>> ObserveData(string filter)
I'd like to return:
IObservable<IEnumerable<string>> Results
when a given string is returned from all ISources. Essentially I want the intersection of all the sources.
If a new source is added then everything should re-evaluate.
I'm struggling to come up with a generic solution to this. Most solutions I've seen have a well known number of sources.
Any ideas appreciated.
Answer
Ok after thinking for a while longer I came up with my answer. Possibly it can be improved on but it seems to work for me so I'll post it here for reference in case someone else has a similar issue. Thanks to ibebbs and Shlomo for taking the time to reply, much appreciated.
//Arrange
var s1 = Substitute.For<ISource>();
s1.ObserveData(Arg.Any<string>()).Returns(Observable.Return(new[] { "a", "b", "c", "d" }));
var s2 = Substitute.For<ISource>();
s2.ObserveData(Arg.Any<string>()).Returns(Observable.Return(new[] { "b", "xx", "c", "d" }));
var s3 = Substitute.For<ISource>();
s3.ObserveData(Arg.Any<string>()).Returns(Observable.Return(new[] { "yy", "b", "ff", "d" }));
var expected = new[] { "b", "d" };
var sources = new[] { s1, s2, s3 }.ToObservable();
var scheduler = new TestScheduler();
var observer = scheduler.CreateObserver<IList<string>>();
//Act
sources.Buffer(TimeSpan.FromMilliseconds(500), scheduler)
.Select(s => Observable.CombineLatest(s.Select(x => x.ObserveData("NoFilter"))))
.Switch()
.Select(x =>IntersectAll(x))
.Do(x => Console.WriteLine($"Recieved {string.Join("," , x)}"))
.Subscribe(observer);
scheduler.AdvanceBy(TimeSpan.FromMilliseconds(500).Ticks);
//Assert
observer.Messages.AssertEqual(
OnNext<IList<string>>(0, s => s.SequenceEqual(expected)),
OnCompleted<IList<string>>(0));
For IntersectAll, see Intersection of multiple lists with IEnumerable.Intersect()
Ok, second attempt and I'm pretty sure this is what you need (test fixture included at the bottom):
public interface ISource
{
IObservable<IEnumerable<string>> ObserveData(string filter);
}
public static class ArbitrarySources
{
public static IObservable<IEnumerable<string>> Intersection(this IObservable<ISource> sourceObservable, string filter)
{
return sourceObservable
.SelectMany((source, index) => source.ObserveData(filter).Select(values => new { Index = index, Values = values }))
.Scan(ImmutableDictionary<int, IEnumerable<string>>.Empty, (agg, tuple) => agg.SetItem(tuple.Index, tuple.Values))
.Select(dictionary => dictionary.Values.Aggregate(Enumerable.Empty<string>(), (agg, values) => agg.Any() ? agg.Intersect(values) : values).ToArray());
}
}
public class IntersectionTest
{
internal class Source : ISource
{
private readonly IObservable<IEnumerable<string>> _observable;
public Source(IObservable<IEnumerable<string>> observable)
{
_observable = observable;
}
public IObservable<IEnumerable<string>> ObserveData(string filter)
{
return _observable;
}
}
[Fact]
public void ShouldIntersectValues()
{
TestScheduler scheduler = new TestScheduler();
var sourceA = new Source(scheduler.CreateColdObservable(
new Recorded<Notification<IEnumerable<string>>>(TimeSpan.FromSeconds(1).Ticks, Notification.CreateOnNext<IEnumerable<string>>(new string[] { "a", "b" })),
new Recorded<Notification<IEnumerable<string>>>(TimeSpan.FromSeconds(3).Ticks, Notification.CreateOnNext<IEnumerable<string>>(new string[] { "a", "b", "c" }))
));
var sourceB = new Source(scheduler.CreateColdObservable(
new Recorded<Notification<IEnumerable<string>>>(TimeSpan.FromSeconds(1).Ticks, Notification.CreateOnNext<IEnumerable<string>>(new string[] { "a", "c" })),
new Recorded<Notification<IEnumerable<string>>>(TimeSpan.FromSeconds(3).Ticks, Notification.CreateOnNext<IEnumerable<string>>(new string[] { "b", "c" }))
));
var sources = scheduler.CreateColdObservable(
new Recorded<Notification<ISource>>(TimeSpan.FromSeconds(1).Ticks, Notification.CreateOnNext<ISource>(sourceA)),
new Recorded<Notification<ISource>>(TimeSpan.FromSeconds(2).Ticks, Notification.CreateOnNext<ISource>(sourceB))
);
var observer = scheduler.Start(() => sources.Intersection("test"), 0, 0, TimeSpan.FromSeconds(6).Ticks);
IEnumerable<string>[] actual = observer.Messages
.Select(message => message.Value)
.Where(notification => notification.Kind == NotificationKind.OnNext && notification.HasValue)
.Select(notification => notification.Value)
.ToArray();
IEnumerable<string>[] expected = new []
{
new [] { "a", "b" },
new [] { "a" },
new [] { "a", "c" },
new [] { "b", "c" }
};
Assert.Equal(expected.Length, actual.Length);
foreach (var tuple in expected.Zip(actual, (e, a) => new { Expected = e, Actual = a }))
{
Assert.Equal(tuple.Expected, tuple.Actual);
}
}
}
This approach has the added benefit of not re-querying existing sources when a new source is added but will recompute the intersection each time any source emits a value.
How about this:
public IObservable<IEnumerable<string>> From(this IObservable<ISource> sources, string filter)
{
return sources
.Scan(Observable.Empty<IEnumerable<string>>(), (agg, source) => Observable.Merge(agg, source.ObserveData(filter)))
.Switch();
}
Be aware, that every time a new source is emitted from sources all the sources that have been emitted previously will have their ObserveData method called again. Therefore this solution doesn't scale particularly well but does meet your 'If a new source is added then everything should re-evaluate' requirement

CombineLatest, but only push for the left

I need to implement a version of CombineLatest (I'll call it WithLatest here) that calls the selector for every item on the left and the latest item on the right. It shouldn't push for items on the right changing only.
I think whether this is built Observable.Create or a combination of existing extensions is not particularly important; I'll be making this a "boxed" extension method either way.
Example
var left = new Subject<int>();
var right = new Subject<int>();
left.WithLatest(right, (l,r) => l + " " + r).Dump();
left.OnNext(1); // <1>
left.OnNext(2); // <2>
right.OnNext(1); // <3>
right.OnNext(2); // <4>
left.OnNext(3); // <5>
should yield
2 1
3 2
Edit: The logic of my example goes:
Left becomes populated with 1. Right is empty, no values pushed.
Left becomes updated with 2 (it forgets the previous value). Right is still empty, so nothing is pushed.
Right becomes populated with 1, so Left = 2 (the latest value), Right = 1 is pushed. Up to this point, there is no difference between WithLatest and CombineLatest
Right is updated -- nothing is pushed. This is what's different
Left is updated with 3, so Left = 3, Right = 2 (the latest value) is pushed.
It's been suggested that I try:
var lr = right.ObserveOn(Scheduler.TaskPool).Latest();
left.Select(l => l + " " + lr.First()).Dump();
but this blocks on the current thread for my test.
You can do this using existing operators.
Func<int, int, string> selector = (l, r) => l + " " + r;
var query = right.Publish(rs => left.Zip(rs.MostRecent(0), selector).SkipUntil(rs));
Publish ensures we only ever subscribe to right once and share the subscription among all subscribers to rs.
MostRecent turns an IObservable<T> into an IEnumerable<T> that always yields the most recently emitted value from the source observable.
Zip between IObservable<T> and IEnumerable<U> emits a value each time the observable emits a value.
SkipUntil skips the pairs (l, r) which occur before right ever emits a value.
I also had the same need for a CombineLatest which "pushes only for the left".
I made the solution an "overload" of Observable.Sample, because that's what the method does:
It samples a source (right) with a sampler (left), with the additional capability of providing a resultSelector (like in CombineLatest).
public static IObservable<TResult> Sample<TSource, TSample, TResult>(
this IObservable<TSource> source,
IObservable<TSample> sampler,
Func<TSource, TSample, TResult> resultSelector)
{
var multiSampler = sampler.Publish().RefCount();
return source.CombineLatest(multiSampler, resultSelector).Sample(multiSampler);
}
Based on the solution picked by the post author I think there's an even simpler solution utilizing DistinctUntilChanged:
public static IObservable<TResult> CombineLatestOnLeft<TLeft, TRight, TResult>(this IObservable<TLeft> leftSource, IObservable<TRight> rightSource, Func<TLeft, TRight, TResult> selector) {
return leftSource
.Select<TLeft, Tuple<TLeft, int>>(Tuple.Create<TLeft, int>)
.CombineLatest(rightSource,
(l, r) => new { Index = l.Item2, Left = l.Item1, Right = r })
.DistinctUntilChanged(x => x.Index)
.Select(x => selector(x.Left, x.Right));
}
or even
public static IObservable<TResult> CombineLatestOnLeft<TLeft, TRight, TResult>(this IObservable<TLeft> leftSource, IObservable<TRight> rightSource, Func<TLeft, TRight, TResult> selector) {
return leftSource
.CombineLatest(rightSource,
(l, r) => new { Left = l, Right = r })
.DistinctUntilChanged(x => x.Left)
.Select(x => selector(x.Left, x.Right));
}
if you only care about distinct values of leftSource
On latest System.Reactive, we can use WithLatestFrom extension method.
left.WithLatestFrom(right, (l, r) => l + " " + r).Dump();
The result would be below correctly.
3 2
Here's the hacky way using Create - didn't actually build it, mea culpa if it doesn't actually work :)
public static IObservable<TRet> WithLatest<TLeft, TRight, TRet>(
this IObservable<TLeft> lhs,
IObservable<TRight> rhs,
Func<TLeft, TRight, TRet> sel)
{
return Observable.Create<TRet>(subj => {
bool rhsSet = false;
bool deaded = false;
var latestRhs = default(TRight);
Action onDeaded = null;
var rhsDisp = rhs.Subscribe(
x => { latestRhs = x; rhsSet = true; },
ex => { subj.OnError(ex); onDeaded(); });
var lhsDisp = lhs
.Where(_ => deaded == false && rhsSet == true)
.Subscribe(
x => subj.OnNext(sel(x, latestRhs)),
ex => { subj.OnError(ex); onDeaded(); },
() => { subj.OnCompleted(); onDeaded(); });
onDeaded = () => {
deaded = true;
if (lhsDisp != null) {
lhsDisp.Dispose();
lhsDisp = null;
}
if (rhsDisp != null) {
rhsDisp.Dispose();
rhsDisp = null;
}
};
return onDeaded;
});
}
I made a RX operator for project today that does this.
Here's my solutions:
public static IObservable<Tuple<TSource, TTarget>> JoinLeftSoft<TSource, TTarget>(
this IObservable<TSource> source, IObservable<TTarget> right)
{
return source
.Select(x => new Tuple<object, TSource>(new object(), x))
.CombineLatest(right, (l, r) => new Tuple<object, TSource, TTarget>(l.Item1, l.Item2, r))
.DistinctUntilChanged(t => t.Item1)
.Select(t => new Tuple<TSource, TTarget>(t.Item2, t.Item3));
}